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(54) Database management system with Improved Indexed accessing 

(57) A database management system (DBMS) pro- 

\nded with a multi-dimensional improved indexed _ 

accessing capability using Iceyed index searching, indi- 
vidual searcin keys are constructed from general 
expression statements created in tlie DBMS compiler 
from search queries supplied to the DBMS. Each i^ey 
column represents another dimension, and both ranges 
and IN lists can be specified in the search query and 
used as the predicate values in multiple columns. Miss- 
ing predicate values in the search query are interpreted 
as a specification of the minimum and maximum values 
for the associated search key column. 

During compile time, the DBMS compiler produces 
general expressions to be used by the DBMS executor 
during run time to create the search keys. The DBMS 
compiler evaluates search queries by associating pred- 
icates with clusters and disjunct numbers assigned to 
each individual disjunct in the search query expression. 
The DBMS executor uses the general expression from 
the compiler and eliminates any conflicts among same 
column predicates, removes redundancies in predicate 
values and disjuncts and reduces the number of records 
to be accessed to the minimum required to complete the 
search query. The individual search keys are generated 
in the same order as the index to which the search per- 
tains (i.e., increasing or decreasing order). 
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Description 

BACKGROUND OF THE INVENTION 

5 This invention relates generally to relational database management systems, and particularly to a relational data- 
base management system with improved indexed accessing. 

Most relational database management systems (DBMS) use B-Trees to allow users to perform queries on large 
databases or tables using appropriate commands, such as SQL (structured query language) commands. A B-Tree is 
an index residing in the database memory and having one or more columns, with each column representing another 

10 dimension in the index. B-Tre^ permit searching for records in a database using one or more keys specified by users 
by means of appropriate query commands. Because the keys define a subset of an entire table of records, indexed 
searching can eliminate the need to search through the entire table of records in order to retrieve a much smaller subset 
of such records pertinent to the user query. 

Ideally, for most queries, indexed accessing enables the pertinent records to be accessed at minimum transactional 

IS cost (the cost being measured by an appropriate parameter, such as the equivalent number of secondary storage 
accesses as required to complete the query). However, many user queries presented to a DBMS using a B-Tree index 
structure cannot be processed using an indexed access technique. In such cases, the entire table of records must be 
searched, which is time consuming and relatively inefficient. In addition, the extent to which the total number of records 
required to be searched can be reduced is severely limited in a B-Tree index structure by virtue of the constraint that 

20 only one key column in an index can be used to specify a range of values or a specific list of values (termed an IN list). 
For example, consider a table T having columns (a, b. c, d, e, f, g) and a correspoding index I having columns (a, b, c, 
d). A user query requesting that the DBMS select all records for the predicates; 

where a = 10 

25 and b between 20 and 30 

and G = 40 
and d = 50 

would result in begin-end keys of the form: 

30 

begin key: a = 1 0, b >- 20 
end key:a = 10,b<=30 

Even though the predicates specify c = 40 and d = 50. these two equality predicates on key columns c and d are not 
35 incorporated in the search key due to the range on b. As a result, a search in the index for records conforming to the 

predicates wnll be able to seek to the nearest (a, b) pair, but cannot position precisely on the desired (a, b, c, d) values. 

Stated differently, all of the c and d values in the index must be examined and compared with the specific c and d values 

desired. Depending on the number of records in the c and d columns, the length of time required to perform the index 

search (and thus the cost), at best, is greater than optimal. 
40 Another disadvantage with existing B-Tree index structures is the constraint imposed when a key column predicate 

is missing (unspecified). For example, consider the predicates: 

where a = 10 
and 0=40 
45 and d = 50 

The begin-end key constructed by the traditional key building method is as follows: 

begin key: a = 10 
so end key: a = 10 

In this case, the absence of a predicate on column b prevents the predicates on columns c and d from being used. Con- 
sequently, the index table must be searched for ail values (b, c, d). A special case occurs when the missing key predi- 
cate is on the first column of a partitioning key. For example, consider the predicates: 

55 

where b = 30 
and c = 40 
and d = 50 
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The tradifional key building method cannot construct a search key from these predicates because a predicate on the 
first column a of the partitioning key is missing. As a result, the query execution plan composed by the DBMS must per- 
form a full table scan. 

Still another disadvantage inherent in the B-Tree index structure results when the search query includes an inter- 
5 vening iN predicate. For example, consider the predicates: 



where a = 10 

and b in (20, 30) 

and c - 40 

10 and d = 50 



The IN predicate on column b specifies an equality comparison with a set of values. The traditional key building method 
is capable of dealing with an equality comparison that involves exactly one value, but not a set of values. Consequently, 
the IN predicate cannot be used in the key. As a result, the search key contains only column a: 

begin key: a = 10 
end key: a = 10 

The absence of a usable key predicate on column b prevents the predicates on columns c and d from being used. As 
20 a result, a complete search of columns b, c, d must be conducted. 

Another disadvantage with existing B-Tree index-based DBMSs results from the mandatory use of the disjunctive 
normal form for the predicates of the query commands. More particularly, predicates are in disjunctive nornial form 
when only ORs exist at the outer level. For example, consider the query command: 

^ ■. . - . SEIjECT - ^ ^ ^ 

FROM T 

WHERE (a = 5 and ( (b=l and c IN (2,4,9)) 

30 OR (b=8 and c=7)) 

OR (a between 4 and 6 

and ( (b between 8 and 10 and 

c between 6 and 9) 
OR (b=9 and c=ll) 



The disjuncts for this expression are: 
40 (a=:5 and b=1 and c IN (2,4,9)) 

OR (a=5 and b=8 and c=7) 

OR (a>=4 and a<=6 and b>=8 and b<=1 0 and c>=6 and c<=9) 
OR (a>=4 and a<=6 and b-9 and 0=11) 

45 

in a typical known DBMS, the search for all records pertinent to the query command would begin with the first disjunct. 
Once all such records have been found, the search would be conducted for all records pertinent to the second disjunct, 
followed by a search for all records pertinent to the third disjunct, and ending virith all records pertinent to the fourth dis- 
junct. This technique is inefficient due to the fact that the single record in the second disjunct is accessed again during 
so the search for records pertinent to the third disjunct. Although the above example requires only one repetitive reading 
of the same record, in practice relatively large numbers of repetitions occur routinely when accessing records in rela- 
tively large databases. 

Further compounding the repetitive record problem Is the fact that a record cannot be returned twice to a user dur- 
ing an access due to semantic constraints imposed by the system. Consequently, most DBMSs must create ai table of 
55 records that have been read during an access, which can require a substantial amount of memory space, depending 
upon the number of records specified by the search query. 

This problem of repetitive record accessing is sometimes exacerbated by the appearance of conflicting predicates 
in a query command. While a user rarely submits a query command with conflicting predicates, the problem can fre- 
quently arise when views with hidden predicates are used or when host variables are used in combination with fixed 
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values, or when queries are generated by software. For ejcample, assume the following view exists: 
Create view VT as 

SELECT * from T where b IN (3. 9. 16, 25. 36); 
Assume the user query is: 
5 SELECT* from VT 

WHERE 

b between 20 and 30 
AND clN(4a 100. 150) 
10 AND d=50; 

In this example, the user has unknowingly required the DBMS to search all values of b in the view lying in the range 
between 20 and 30, when there is only one value b in the view lying within this range. As a result, all values of b between 
20 and 30 will be accessed, resulting in 30 unnecessary records (10 unnecessary b values times 3 c values). 

15 

SUMMARY OF THE INVENTION 

The invention comprises a method and system for affording improved indexed accessing of records in a relational 
dat^ase management system which is capable of substantially reducing the total number of records accessed in a 

20 given search, which permits ranges and IN lists on multiple columns of a search key, which constructe a usable search 
key in the absence of one or more column values, and which orders the individual columns of the search key in the 
same manner as the table to which the key pertains. 

The invention is carried out by means of a portion of a DBMS compiler termed the Optimizer and a DBMS Executor. 
The Optimizer is a process component of a DBMS compiler which initially evaluates a search query and generates key 

25 expressions for the DBMS Executor. The key expressions describe multi-column keys, including the range and IN list 
predicates on the individual columns. The Optimizer performs general OR optimization and associates query statement 
predicates with clusters and disjunct numbers. The use of disjunct numbers in the Optimizer allows the invention to min- 
imize memory space usage for the predicates and disjuncts, since predicates are not repeated: rather, a list of disjunct 
numbers in which each predicate appears is created. Predicates may share a common set of disjunct numbers. IN lists 

30 are treated as a single disjunct to minimize the number of disjuncts. 

The DBMS Executor comprises a set of procedures in the DBMS system library that executes compiled query 
statements against database tables, views or catalogs. The DBMS Executor evaluates the key expressions supplied by 
the Optimizer portion of the DBMS compiler in order to create a data structure termed a GEM-tree. Each GEM-tree con- 
tains information concerning key columns, describing ranges and exact values, predicates defined on each column, 

35 comparison operators and other information. The process of building the GEM^ree conducted in the DBMS Executor 
includes combining ranges and eliminating duplicates on the key columns, while preserving the order in each column 
(i.e., ascending order or descending order). After a GEM-tree has been constructed by the DBMS Executor, values are 
retrieved from the tree to build the actual keys for reading date from the required tables. 

In the process of constructing the GEM-tree. the DBMS Executor sorts and collapses values from different dis- 

40 juncts in a column together so that individual records are only read once. This results in a significant saving in the cost 
of executing a search plan, since all duplicate values are eliminated. Further, the keys are built in such a fashion that 
the data from each index is read in index order, even when multiple disjuncts are present, which facilitates the accessing 
of individual records. 

The use of disjunct numbers in the DBMS Executor on a per column basis allows the collapse of multiple column 
45 disjuncts so that the same record never needs to be read twice. This facilitates the sorting and collapsing of values by 
the DBMS Executor. 

The DBMS Executor also finds the minimum set of all predicates for a disjunct from all predicates for that column 
in a disjunct. This occurs when there are multiple predicates on a single column which conflict. The minimum set of all 
predicates is determined by finding the minimum set of values necessary for the combination of predicates. This tech- 
no nique, taken together with the sorting and collapsing of values from different disjuncts ensures that only the minimum 
amount of data need be read. 

In the process of constructing the GEM-tree, the DBMS Executor recognizes missing keys and the specification of 
ranges and IN lists in the generalized key expressions supplied by the Optimizer portion of the DBMS compiler. This 
permits a multi-dimensional view of the index, and allows efficient access. 
55 For a fuller understanding of the nature and advantages of the invention, reference should be had to the ensuing 
detailed description taken in conjunction with the accompanying drawings. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a computer system for storing and providing user access to data In stored databases; 
FIG. 2 is a block diagram of data structures stored in a database management system; 
5 FIG. 3 is a block diagram of the catalog data structure, representing database tables and programs, shown in Rg- 

ure 2; and 

FIGS. 4A-4H are block diagrams representing portions of the tables included in the catalog data structure of Figure 
3. 

10 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Referring to Figure 1 , there is shown a computer system 100 for storing and providing user access to data in stored 
databases. The system 100 is a distributed computer system having multiple computers 102, 104, 106 interconnected 
by local area and wide area network communication media 108. The system 100 generally includes at least one data- 
15 base server 1 02 and many user workstation computers or terminals 104, 106. 

When very large databases are stored in a system, the database tables will be partitioned, and different parti^ons 
of the database tables will often be stored in different physical disks controlled by different CPUs. However, from the 
viewpoint of user worksta^on computers 1 04, 1 06, the database server 102 appears to be a single entity. The partition- 
ing of databases is well known to those skilled in the art 
20 As shown in Rgure 1 , the database server 1 02 includes a central processing unit (CPU) 110, prinnar y memory 1 1 2, 
a communications interface 1 14 for communicating with user workstations 104. 106 as well as other system resources 
not relevant here. Secondary memory 116, typically magnetic disc storage, in the database server 1 02 stores database 
tables 1 20, database indices 1 22, a database management system (DBMS) 123 for enabling user and operator access 
to the database tables, and one or more catalogs 126 for storing schema infprmation about the database tables 120 as 
25 well as directory information for programs used to access the database tables. The DBMS 1 23 includes a SQL executor 
124 as well as other database management subsystems, such as an SQL catalog manager 125 ard an SQL command 
interpreter. The DBMS 1 23 further includes an SQL compiler 1 28 for compiling source code database query programs 
130 into compiled execution plans 132. The SQL compiler 128 can also be used to compile any specified SQL state- 
ment so as to generate an execution plan. 
3D End user workstations 104, 106, typically include a central processing unit (CPU) 140, primary memory 142, a 
communications interface 144 for communicating with the database server 102 and other system resources, secondary 
memory 146, and a user interface 148. The user interface 148 typically includes a keyboard and display device, and 
many include additional resources such as a pointing device and printer. Secondary memory 146 is used for storing 
computer programs, such as communications software used to access the database server 102. Some end user work- 
35 Stations 1 06 may be "dumb" terminals that do not include any secondary memory 1 46, and thus execute only software 
downloaded into primary memory 142 from a server computer, such as the database server 102 or a file server (not 
shown). 

Glossary 

40 

To assist the reader, the following glossary of terms used in this document is provided. 

SQL: SQL stands for "Structural Query Language." Most commercial database servers utilize SQL Any program 
for accessing data in a database which utilizes SQL is herein called an "SQL Program." Each statement in an SQL 
45 program used to access data in a database is called an "SQL statement." An SQL program contains one or more 
SQL statements. 

Execution Plan: An SQL statement which has been compiled into an intermediate form that specifies a method to 
efficiently access data in a database. 

so 

Execution Characteristics: Characteristics of an execution plan that have no effect on its semantics (i-e., opera- 
bility). Examples are the performance of a plan, and its resource consumption. 

Object(s): An object is a file, database table or other encapsulated computer resource accessed by a program as 
55 a unitary structure. In the context of the preferred embodiment, objects are database tables. In other implementa- 
tions of the present invention, objects may be other encapsulated computer resources which the end user 
accesses indirectly through validated methods (i.e., programs) designated specifically to access those computer 
resources. 
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DDL Statement; A data definition language statement. DDL statements are used to create and modify database 
object (e.g., tables and indices). 

DEFINE name: An object handle or link indicating an object to be accessed by an execution plan. An SQL state- 
ment may reference objects via a DEFINE names, instead of referencing them directly. Tnis enables the operator 
to redirect the program to access a different set of objects than the compile-time objects, without having to alter the 
program, by merely altering the DEFINE names to point to new objects before executing the program. 

Source Code Program/Statement: For each execution plan there is a corresponding source code SQL statement. 
A source code program is the set of SQL statements con-esponding to a set of execution plans which together are 
herein called a "compiled program." 

SQL compilation: The act of compiling an SQL statement program or an SQL program. The compilation can be a 
"static^' compilation, perfonned by invoking an SQL compiler, such as Tandem's® SQLCOMP™, to generate new 
execution plans for SQL statements in the program. An SQL compilation can also be a dynamic compilation, such 
as an automatic recompilation initiated due to a program being marked invalid or due to timestamp mismatch 
between an SQL statement and an object referenced by the statement 

End user: A person using a workstation to access database information in a database server. End users typically 
do not have the authority to modify the structure of database tables. 

Operator: A person using a workstation who has the authority and access rights to modify the sfructure of data- 
base tables and to manually initiate compilation of SQL source code programs. 

View: A logical definition of a relation without physical existence. Hie data presented by a view is derived from a 
Base Table. 

GEM: The acronym for general expression method, the term used to denote the se/eral aspects of the invention. 
MDAWl: The acronym for multi-dimensional access method, an alternate term for the invention. 
Database Server 

Figure 2 depicts some of the interrelationships between the data structures and programs stored in the database server 
102. 

A source code program 130-1 includes a sequence of SQL source code statements 160 as well as other non-SQL 
source code statements (e.g., assignments, and program flow control statements). SQL compiler 128 compiles the SQL 
source code statements into a compiled program 132-1 having a sequence of compiled statements 162 herein called 
execution plans. Each source code statements 160 has a corresponding execution pian 162. The compiled 132 pro- 
gram 132 includes a 'file label" 164 (i.e., a data structure in the program) that stores a set of runtime properties used 
by the SQL executor 124, as will be discussed in more detail below. In addition, each execution plan 162 includes, in 
addition to the compiled query, a timestamp 1 66 and a set of schema information 1 70 for each of the datebase objects 
to be accessed by that execution plan. 

Each database table or object 120 includes a "disk label" 180, herein called an object header, and an object body 
182, The object header 180 stores information about the structure, identity and other characteristics of the database 
object 120, while the object body stores the data content of the database object The object header 180 includes an 
object name field 1 84 and an object schema 1 86 defining the structure and other characteristics of the data in the data- 
base object. The object schema 186 stored in the database object's "disk label" 180 is a compact representation of the 
catalog information stored for the database table in the SQL catalog 126 and thus includes data attribute definitions 
188, timestamps 190 and other object parameters 192. 

The catalog 1 26 is itself a database having a set of tables for storing information about the database objects (e.g., 
tables and indices) stored in the database server as well as information about the programs stored in the database 
server. The stmcture of the catalog 126 will be described in more detail with reference to Rgures 3 and 4A through 4H. 

The compiler 128, as mentioned above, compiles an SQL program 130 into a compiled SQL program 132 having 
a set of execution plans 1 62. Operation of the compiler is initiated by either a manually entered compiler command 194 
or a recompile command 1 96 generated by the SQL executor 124 when it attempts to execute an invalid or inoperable 
execution plan. Manually initiated compilations are governed by user or operator entered commands 1 94 or, more com- 
monly, user or operator initiated scripts that contain sequences of data definition and compiler commands. 

The SQL executor 124 responds to both end user and operator runtime commands 1 98. Such runtime commands 
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include standard end user initiated plan execution commands, such as to retrieve data from database tables and to add 
data to database tables. Runtime commands 198 also include DDL statements, for creating database tables and for 
modifying the structure of existing database tables, although the authority to use these runtime commands 198 is usu- 
ally restricted to a set of persons herein called operators. For a fuller description of the Figure 1 system operation, ref- 
erence may be made to "TANDEM SYSTEMS REVIEW", Vol. 4, No. 2, July, 1988, the disclosure of which is hereby 
incorporated by reference. 

Catalog and Disk Label Data Structures 

Referring to Figure 3, tlie catalog 126 in the preferred embodiment, consists of a set of tables 200-220 representing 
database tables and programs in a database server or set of database servers. 

Referring to Figure 4A, the "Tables" table 200 includes one record 222 for each database table in the database 
sender. Each record 222 includes the foHovwng fields: 

TableName 224, denoting the name of the database table; 

TableType 225, indicating whether the referenced database table is a true database table or a "view." which is a 
subset of one or more database tables that are referenced in the same manner as a database table by SQL state- 
ments; 

ColumnCount 226, indicating the number of distinct columns in the table (or view); 
CreateTime 227, is a timestamp value Indicating when the table (or view) was first created; 
RedefTime 228, is a timestamp value indicating when the table (or view) was last altered; 

SimilarityCheck 229, is a flag whose value is Enabled when similarity checks on the table are aJiowed and is Disa- 
bled othenwise; and other parameters 230 not relevant here. 

Referring to Figure 4B, the "Base Tables" table 202 includes one record 232 for each datafc)ase table in the data- 
base server. Each record 232 includes the following fields: 

TableName 234 denoting the name of the database table; 

FileName 235, indicting the name of the disk file in which the referenced database table is stored; 

RowCount 236, indicating the number of rows in the table; 

RowSize 237, indicating the maximum size (in bytes or words) of each row; 

VaiidDef 238, is a flag (Y or N) value indicating if the file has a valid definition, correct file label and catalog entries; 
ValidDef 239, is a flag (Y or N) value indicating if the data in the table is consistent with data in the table's indexes 
and satisfies constraints on the tabi e; 

Constraints 240. is a flat (Y or IM) value indicating whether the table has any defined constraints; and other param- 
eters 241 not relevant here. 

Referring to Figure 4C, the "Columns" table 204 includes one record 252 for each column of each database table 
in the database server. Each record 252, representing characteristics of one database table column, includes the fol- 
lowing fields: 

TableName 254. denoting the name of the database table in which the column corresponding to this record 252 
resides; 

ColumnNumber 255, denotes a number indicating the position of the column in each row of the table, where the 
first column has a Column/Number of 0; 

ColumnName 256, denotes the column's name, also called the SQL identifier, for the column; 
ColumnSize 257, indicating the size (in bytes or words) of the data in the column; 

UniqueEntryGount 258, denotes the number of unique data entries in the column for the table or table partition: 
HeadingTexl 259, denotes a text string used as a default column heading when printing data extracted from this the 
column of the database table; and other parameters 260 not relevant here. 

Referring to Figure 40, the indexes table 210 includes one record 262 for each database index in the database 
server. Each record 262 includes the following fields: 

TableName 263 denoting the name of the database table; 
IndexName 264, indicating the name of the index; 

GoICount 265, indicating the number of columns used in the index, including the primary key columns; 
Keytag 266, specifying the keytag, if this is a primary key index for the database table; 
ValidDef 267, is a flag (Y or N[| value indicating it the Index definition is valid; 
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ValidData 268, is a flag (Y or N) value indicating if the index has valid data; 

UniqueValue 269. is a flag (Y or N) value indicating whether all entries in the index are unique: 

IndexLevels 270, indicating the number ot levels of inds)dng in this index; 

RowSize 271 , indicating the size of each index record; 

FileName 272. indicating the file that contains the index; and other parameters 273 not relevant here. 

Referring to Figure 4E, the Keys table 212 includes one record 282 for each column of the primary key and each 
other index for each database table in the database server. Each record 282, representing one table column for one key 
or index, includes the following fields: 

indexName 283, denoting the name of the index; 

KeySequenceNumber 284, indicates the position of the column in each index row; 
TableColumnNumber 285, indicates the position of the column in each table row; and 
Ordering 286, indicates whether the column is an ascending order or descending order column. 

Referring to Figure 4F, the Files table 206 includes one record 292 for each database table and index in the data- 
base server. Each record 292, representing characteristics of one database file, includes the following fields; 

FileName 294, denoting the name of a database file, and corresponds to the FileName entry 235 in a Base Tables 
record 232 or an entry 272 in an index record 262; 

FileType 295, indicates how data within the file is sequenced (e.g., entry sequenced, key sequenced, etc.); 
BlockSize 296, denotes the size of the secondary memory blocks {e.g., 512. 1024, 2048 or 4096 bytes) in which 
the file is stored; 

Partitioned 297, is a flag (Y or N) indicating whether or not the file is partitioned: 
RecordSize 298, denotes the maximum length of a record in the file; 

Compressioninfo 299, is a f iag CY or N) value indicating whether data in the data pages and index pages of the file 
have been compressed; 

Exlentslrrfo 300, denotes the size of the primary and secondary extents and the ma)dmum number of extents in the 
file; and other parameters 301 not relevant here. 

When a file is partitioned, the Partitions table 214 will contain one record for each partition of the file Indicating the 
partition name and catalog entry for each partition as well as the starting values for each column in the file's primary key. 

Refemng to Figure 4G, the Programs table 208 includes one record 312 for each registered program In the data- 
base server. Each record 312, representing characteristics of one program, includes the following fields: 

ProgramName 314, denoting the name of a program;. 
OwnerlD 315, identifies the program's owner; 

CreateTime 31 6, is a timestamp value indicating when the program was first SQL compiled; 
RecompileTime 317, is a timestamp value indicating when the program was last recompiled; 
Valid 31 8, is a flag (Y or N) value indicating whether or not the program is valid; 

AutoCompile 319, is a flag (Y or N) value indicating whether automatic recompilations of the program ^e allowed 
at run time, if required; 

RecompileMode 320, is a mode value that is set to "All" or "OnDemand" and governs (in conjunction with the Auto- 
Compile and CheckMode parameters) when program and statements vwthin the program are reconrpiled, as 
explained in more detail below; 

CheckMode 321 , is a mode value that is set to "InvaiidProgram,*' "InvalidPlans," or "InoperablePlans," and governs 
(in conjunction with the AutoCompile and REcompileMode parameters) when the program and statements within 
the program are recompiled; 

Similaritylnfo 322, is a flag (Y or N) value indicating whether the compiled program includes similarity information 
for each of the program's compiled statements, where the similarity inforniation for each statennent consists of a 
subset of the schemas for database tables accessed by that statement; and other parameters 323 not relevant 
here. 

Referring to Figure 4H,. the Usages table 220 includes one record 330 for each usage of one object by another. For 
each program that uses a database table there is a record 330, representing that relationship that includes the following 
fields: 

UsedObjectName 332, identifies the name of the "used" object, which in the context of this document is the 
tablename of a database table; 
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UsedObjectType 334, identifies Ihe type of the used object, which in this case is ^'table"; other defined object types 
for used objects are "view" and "index"; 

RelationshipType 336. is flag that is enabled if the using object depends on the used object; 
UsingobjectName 337, identifies the name of the "using" object, which in the context o^ this document is the name 
5 of a program; 

UsingobjectType 338, identifies the type of the using object, which in this case is "program"; other defined object 
types for using objects are "view" and *1ndex"; and other parameters 339 not relevant here. 

The principal components of the computer system of Figure 1 used to implement the invention are a portion of the 

10 SQL Compiler 128 termed the Optimizer and the SQL Executor 124. The Optimizer is a process component of SQL 
Compiler 128 which initially evaluates a search query and generates key expressions for the SQL Executor 124. The 
key expressions describe multi-column keys, including the range and IN list predicates on the individual columns. The 
Optimizer conducts general OR optimization and associates predicates with clusters and disjunct numbers in the man- 
ner set forth in Appendix A. The use of disjunct numbers in the Optimizer allows the invention to minimize memory 

15 space usage for the predicates and disjuncts. More particularly, in prior art compilers the process of converting a search 
query to disjunctive normal form requires that the predicates be repeated for all the disjuncts in which they appear, in 
the invention, predicates are not repeated: rather, a list of disjuncts in which each predicate appears is created in the 
manner settorth in Appendix A. In addition, many predicates share a common set of disjunct nunnbers. Another advan- 
tage of the invention resides in the treatment of IN lists as a single disjunct in order to minimize the number of disjuncts. 

20 An IN list is a shorthand way of specifying a list of single predicates on the same column that are ORed together. This 
list of same column predicates ORed together is treated as a single disjunct. 

The SQL Executor 1 24 comprises a set of procedures in the system library that executes compiled SQL statements 
against database tables, views or the database catalogs. The SQL Executor 124 evaluates key expressions supplied 
by the Optimizer portion of SQL Compiier 128 in order to create a data structure termed a GEM-tree. Each GEM-tree 

25 contains information concerning key columns, describing ranges and exact values, predicates defined on each column, 
comparison c^erators and other infonnation. The process of building the GEM-tree conducted in the SQL Executor 124 
includes combining ranges and eliminating duplicates on the key columns, while preserving the order in each column 
(i.e., ascending order or descending order). After a GEM-tree has been constructed by the SQL Executor 124, values 
are retrieved from the tree to buiid the actual keys for reading data from the required tables. The specific manner in 

30 which the SQL Executor 124 constructs a GEM-tree and builds the actual keys for reading data from the tables is set 
forth in Appendix B. 

In the process of constructing me GEM-tree, the SQL Executor 124 sorts and collapses values from different dis- 
juncts in a column together so that individual records are only read once. This is a significant saving in the cost of exe- 
cuting a search plan, since all duplicate values are eliminated. In addition, the keys are built in such a fashion that the 
35 data from each index is read in index order, even when multiple disjuncts are present, which faciiitates the accessing of 
individual records. 

The use of disjunct numbers in the SQL Executor 1 24 on a per column basis allows the collapse of multiple column 
disjuncts so that the same record never needs to be read twice. This facilitates the sorting and coliapsing of values by 
the SQL Executor 124. 

40 The SQL Executor 1 24 also finds the minimum set of all predicates for a disjunct from all predicates for that column 
in a disjunct. This occurs when there are multiple predicates on a single column which conflict. The SQL Executor 124 
finds the minimum set of ail predicates by finding the minimum set of values necessary for the combination of predi- 
cates, which avoids unnecessary reading of data. Taken together with the sorting and collapsing of values from different 
disjuncts, this ensures thai only the minimum amount of data need be read. 

45 In the process of constructing the GEM-tree, the SQL Executor 124 recognizes missing keys and the specification 
of ranges and IN iists in the generalized key expressions supplied by the Optimizer portion of SQL Compiler 128. This 
permits a multi-dimensional view of the index, and allows efficient access even if the user has defined a column to the 
beginning of an index for purely partitioning purposes. 

Specific examples of the features noted above are as follows. 

so 

Intervening Range 

Assume the user search query includes the following predicates: 

55 WHERE a=10 

AND b between 20 and 30 

AND c=40 

AND d=50; 
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In prior art systems, the predicates in column c and d cannot be used as keys because of the intervening range predi- 
cate on b. The invention allows the use of all four key columns in the following fashion. 

Range predicates are processed in the SQL Executor 124 by stepping through the values of the column in the 
range. Assume the values of b between 20 and 30 are (20, 23, 25. 30). The SQL Executor 124 first performs a keyed 
5 access for 

a=10, b=20, 0=40. d=50 

After al! the records for these key columns have been received by the SQL Executor 124, it requests fronn the file system 
the next value of b, which is 23. The value 23 is then substituted as the key value for b: 
a=10, b=23: c=40. d=50 
10 PSier these records are retrieved, the next value of b is selected G.a, the value 25). 

Missing Key Predicate 

Assume the user search query includes the following predicates: 

IS 

WHERE 

b between 20 and 30 
AND c=40 
AND d=50; 

so 

Since the predicate for key column a is missing (i.e., unspecified), prior art systems cannot use this ind^ for keyed 
access. According to the invention, however, the missing predicate for column a is treated as an implied range of 
MiN„VALUE to MAX„VALUE (including NULL values). The SQL Executor 124 first requests the first value from column 
a from the file system, and substitutes that value in the begin key values. 
25 Assume the values for a incJude all valuesfrom 1 to 100. The SQL Executor 124first performs a key 
the following values: 

a=:1 . b=20. c=40, d=50 

After retrieving the records for this set of values, SQL Executor 1 24 varies b through its four values: 
a=1.b=23.c=40,d=:50 
30 a=1 , b=25, c=40, d==50 

a=1,b=30, c=40,d=50 

SQL Executor 1 24 then obtains the next value of a from the file system and repeats the values for b: 
a=2. b=20, c=40, d=50 
a=2, b=23. c=:4Q, d=50 
35 etc. 

The result is a total of 400 accesses, 

IN Lists 

40 Assume the user search query includes the predicates: 
WHERE 

b between 20 and 30 
AND c IN (40, 100, 150) 
45 AND D=50; 

Columns a and b are treated in the same fashion as in the missing key predicate example given above. In addition, the 
values of c are included. The SQL Executor 124 steps through all the given values for a, b and c. However, since the 
three values of c are specified by the IN list, the SQL Executor 1 24 can use these values directly (and need not reguest 
50 these values from the file system). 

Elimination of Conflicting Predicates 

Assume that the following view exists: 
55 Create view VT as 

SELECT * from T where b IN (3. 9, 1 6, 25, 36) ; 
Assume the user query is: 
SELECT* from VT 
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WHERE 

b between 20 and 30 
AND clN(40, 100. 150) 
AND d;=50: 

5 

The only valid value for b in this query is 25. The SQL Executor 124 finds the union of all the predicates for b, and rec- 
ognizes that only the value 25 exists in both sets, so only that value for b will be used to retrieve records. 

General OR Optimization 

10 

As noted above, general OR optimization is accomplished by associating the predicates with different predicate 
sets in disjunctive normal form (with the exception of IN lists which are processed as a unit). Predicates are in disjunc- 
tive normal form when only ORs ewsts at the outer level. For example, consider the query command: 

15 SELECT * 

FROM T 

WHERE (a = 5 and ( {b=l and c IN (2^4,9)) 

OR (b=8 and c-7) ) 
OR (a between 4 and 6 



and ( (b between 8 and 10 and 
c between 6 and 9) 
OR (b=9 ajid c==ll) 

30 

The disjuncts for this expression are: 
{a=5 and b-1 and C IN (2,4,9)) 

35 OR (a=5 and b=8 and c=7) 

OR (a>=4 and a<=6 and b>=8 and b<=1 0 and c>^6 and c<=9) 
OR (a>=4anda<=6andb=9andc=11) 

The SQL Executor 124 eliminates any conflicting predicates within each disjunct, and then combines the overlaps 
40 among the disjuncts so tiat only the minimum set of records is retrieved, and in the order of the index. For this example, 
the following retrie\^als are made: 

a=:4, b=8, c>==6 and c<=:9 

a=4, b=9, G>=6 and c<-9 

a=4,b=9,c=11 
4S a=4, b=10, o=6and c<=9 

a=5. b=1,c=2 

a=5,b=1,c=4 

a=5, t3=9, c=1 1 

a=5, b=8, c>=6 and c<=9 
so a~5, b=9, o=6 and c<=9 

a:=5, b=9, c=11 

a=5, b=1 0, o=6 and c<=9 

a=6, b=8, c>=6 and c<=9 

a=6, b==9, o=6 and c<=9 

55 3=6, b=9, c=11 

a=6, b=1 0. o=6 and c<=9 

As will now be apparent, the invention affords several advantages over known B-Tree indexed access techniques. 
Firstly, by associating predicates with clusters and clusters with disjunct numbers, the invention permits the use of 
highly complex user search queries, and is thus not restricted to queries in which the predicates are already expressed 
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in disjunctive normal form. In addition, the use of these associative techniques saves substantial space in memory by 
eliminating the need to store several copies of the same predicate. Further, the user can specify both ranges and IN 
iists in multiple key columns, and these ranges and IN lists will be observed in building the l^eys. Likewise, a missing key 
predicate, whether intentionally or inadvertently omitted, does not prevent the construction of a usable key index. More- 

5 over, by sorting and collapsing values from different disjuncts and finding the minimum set of all predicates for a dis- 
junct, repetitive reads of the same record are eliminated from the search key (which eliminates the need to construct a 
table of records that have been read) and only the minimum amount of data will be read in the proper order. 

While the above provides a full and complete disclosure of the preferred embodiments of the invention, various 
modifications, alternate constructions and equivalents will appear to those skilled in the art. For e)ample, although the 

10 invention has been described with reference to a DBMS employing SQL, other query languages may be employed, as 
desired. Therefore, the above descriptions and illustrafe'ons should not be construed as limiting the invention, which is 
defined by the appended claims. 
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Internal Spec GEM . Optimizer 



Objective: 

This paper attempts to describe how the optimizer wiU process the parse tree for a quc^^ lo generaic the 
structures necessary for GEM. These structure wUl be used for costing and to generate appropriate 
structures needed by the Executor to implement GEM. This is best done via an example. So let us 
consider the following WHERE clause: 

(CC = 10 AND B BETWEEN 5 AND 10) OR A IN (2. 4. 5)) AND 
((A = 4 ANDC = 5)OR(CINC5. 10) AND (B = 5 OR A = 2))) 

In order for the Executor to be able to use the above set of pnsdicates for GEM, they would have to be in 
disjunctive normal form. Our objccuvc then is to get these predicates into the following form: 

CA IN (2, 4, 5) AND A = 2 AND C IN {5, 10» 
OR (A 2 AND B BETWEEN 5 AND 10 AND C = 10 AND C IN (5. 10)) 
OR(AIN(2,4.S) ANDB = 5 AKDCINC5, 10)) 

OR (B BETWEEN 5 AND 10 AND B = 5 AND C = 10 AND C IN (5, 10)) 
OR (AIK(2,4,5) AND A=4 ANDC=5) 

0RCA = 4 AND B BETWEEN 5 AND 10 AND C= 10 AND C = 5) 

As you see, this is not quite in disjunctive normal form since wc do have sets of OE^ on ihe^ame column 
specified in the form of IN lists. The Executes is able to deal widi dial. For our purpose we will caU each 
of the above 6 sec of predicates thai are ORed together as disjuncts. 

You will notice thai we can transform some predicates e.g. (A IN(2, 4. 5) AND A = 2) in the fkst disjunct 
toA = 2. Simiiarly we can transform (B BETWEEN 5 AND 10 AND B=:^ 5) ioB = 5. Wecanalsoget 
rid of the sixth disjunct altogether since C cannot possibly be 10 AND = 5 at ihe same time. There are 
many other such transformations possible. We will not attempt to do those in the optimizer. We will 
leave them for the Executor which will have the values for params and host variables available to it. 

As you can see the founh disjunct does not have a predicate on column A. Assuming that the index 
columns are A, B. C the Executor will need predicates on all columns to use GEM. It will add the missing 
predicates of the form (A >= Low AND A <= High) to facilitate this. The optimizer wiU not do this. 

Since the optimizer is generating the individual GEM Iccy predicates for the Executor it can perform a 
simple NOT transformation as welL it can transform a predicate of the form Column NOT = value 
(where value is a constant, literal, paiam, or host variable) to (Column <; value OR Coluixm > value). 
When the selectivity of the column is low this could be very bcneficiaL 

Terminology: 

Let us first define some tcnninology so that there is a better probability of this paper being understood. 
We have already defined the conci^t of disjuncts as we stated our objective. 

We will introduce another term -- Cluster, Clusters are formed of equality predicates on the same colrnnn 
that arc all ORed together. Due to the complexity in costing a mixed set of GRcd predicates, only 
equality predicates are considered for clustering at this time. Clusters are also formed of predicates that 
arc ANDcd togcdicr regardless of die cohimns they arc on. The purpose of grouping prcdicauss together 
into clusters is to reduce redundancy in structures generated in the optimizer. For example^ we will 
cluster predicates of the form (A = 2 OR A = 4 OR A 5) and (C = 10 AND B >=5 AND B <= 10). 
Other forms of predicates (muiti- valued, columns on both or neither side of the binary relational operator, 
o, unary relational operators such as EXISTS, unaiy logical operators such as IS NULL, NOT. etc., and 
LIKE) arc not clustered with other predicates. They are assigned dicir own cluster. 
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So whai wc need at ihe end is each predicate pointing to the cluster \i belongs to and then each cluster 
pointing to the disjuncis it paiiicipates in. An illustration of this is as foUows: 



Other predicate-info entries in the Predicate-Info GEM 
duster ORed of ANDed together struct Ouster 



GEM Disjunct structs 



T 



duster'^addr 



oprand^ciusterflag 

So as you can see, each predicate represented by its Predicaie-Info enay points to its GEM Cluster. A new 
field caUed clusxer^ddr in Pradicate-Info wiU facilitate this. The GEM Cluster entry wiU indicate 
whed;er the group of predicates are ANDed or ORcd together. A flag in the clusier^ags fieidof 
opi^em^luster^tmct caUed opt^nd^luster will indicate this. The GBM Cluster entry wiU then point 
to a list of GEM disjuncts that it participates in. Each disjunct entry poitits to the next. 

Here is a summary of the new fields and struccs up to this point; 

• The predicate-info entry (opt^rcdicate^truct) will have these new fields: 

cluster'^iddr 

Points to the GEM cluster enccy that this predicate-info entry belongs to. 

• The GEM Cluster entry (ope^em^luster^truct) will consist of: 

opf^njidf^luster in cluster^ags 

This bit flag indicates whether the relationship between the cluster entries is an OR or an AND. 

It wiil be FALSE (0) for an OR cluster and TRUE for an AND cluster, 
=* opt^lusier^nTtill^isjuncts'uicltistar^ags 

This bit flag indicates that this cluster participates in all disjuncts and therefore does not point to 

a list of disjuncts, 
=* first^isjunct 

This will point to the fust disjunct in the list of disjunct entries that the cluster participates ixu 
=^ last^isjimct 

This will point to the last disjunct in the list of disjunct entries that the cluster participates in. 



• The GEM Disjunct list entry (opt^emTtiisjunct^tntct) will have: 
disjunct^ 

This is a sequential id assigned to a disjunct from 0 on. It will indicate the disjunct that the 

cluster participates in. 

nexs^ddr 

Pointer to the next disjunct that the cluster belongs to. 

As we go through the parse tree we will need to keep track of certain information so that we can cluster 
predicates together, /iso, wc will use lists of clusters, created as we go through the parse tree, to create 
the disjuncts. For that purpose wc need a new structure. We will call this the GEM Parse List. 
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The GEM Parse list entry (opt^em^arse^trucs) will contain: 
cluster^ddr 

This is thtt address of the GEM cluster enay. Predicates that can be clustered together will be 
assigned the same cluster address. 
=> pred^nfo 

This is the address of the predicate- info entry created. At an OR/AND node this address will be 
used to update the clusier^ddr for the prcdicate-mfo. This will indicate the cluster p which the 
predicate has been assigned. 
col^nfo 

This will point to the column (column-info entiy) the predicate is on. If it matches the col'^info 
address of other prcdicaies in an existing cluster, tiiis predicate will be added to that same cluster. 

This points to another GEM Parse list. It indicates that the cuircnt entry is A^TDed to every entry 
in the other lisL This is the restilt of AND operations. The cross produa between multiple lists 
helps in gcncradng disjuncts. More on this later. 
next^ddr 

This points to the next GEM Parse list entry. 



TravBrsing the Parse Tree: 

So let us assume that the parse tree representation of the WHERE clause, presented earlier, is as follows: 







Ofi 


\ 


30 


/ 


2 




C- 10 



9 10 11 12 • 



Logical nodes and predicates have been numbered for (he discussions that follow. 

We icnow thai optp^^analyze^logical'^xpr currently traverses this parse tree. It processes the nee left hand 
side (LH5) first and then moves to the right hand side (RHS). It does this recursively. The logical 
operauon nodes arc numbered in the order they will be accessed. However, it will move back up the tree 
before it can go down the RHS. 

We will pass a null GEM Parse list from the first node down- The list will have entries for clusters of 
predicates that have been formed as we make our way through the parse tree. These clusters will help 
build the disjuncts that the cluster of predicates participate in. So let us follow the parse tree and see what 
we will generate for GEM. 
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25 



SO 



We first encounter the predicate C = 10. We wiU do the foUowing: 
Create an entry in the GEM Parse list, 

==> Assign to pred^info, the address of the predicae-info entry for this predicate. 

=o Lniiialize the col^info address to thar of the coliimn in the predicate - column C This wiU only 
be done when the predicate is a logical comparison with a constant or a hosivar/param. This is lo 
later consoUdate ORs on the same column into a chnin. If the predicate is not a simple 
comparison such a consolidaiion cannot be done and NULLP should be passed back for coi-info. 

=? Return the list to node 3. 




GEM Pwm m af*rf 
toQEMe 




Predkat»-lnfo 



We will save the list just returned and then proceed to node 4 (RHS). From node 4 we wiH pass a null 
GEM Pane list to its LHS thai will rcwm with the address to the predicate-info entry for B >= 5 (m 
predAinfo) and the col^info address for column B. We will save these, and proceed to the RHS that will 
cetmn with the address to B <= 10 and the col^info address for column B, 

When wc get back to node 4. since it is an AND node, we wiU determine if we have single entries in the 
GEM Parse lists returned from the LHS and RHS, or multiple ones. If we have single entries, we can 
potentially merge them into a single cluster. We will talk about mult^le entries inihe Ust later. We fmd 
that currently there is one entry in cadi Hst. However, there has been no cluster^ addr assigned. Also, the 
col^info addresses for the predicates on the LHS and RHS arc the same. So we wiU do the foUowuig: 
Since no clustcr^addr has been assigned, create a GEM cluster entry. 
=> Set the field opt'^and^;luster to TRUE for AND, 

=0 Assign the new clustcr^s address to clusier^dr for both the LHS and RHS parse Hst entries. 
=> Assign this address to cluster^ddr in predicate-info for both the LHS and RHS predicates. 

acar pcTsd^info in both the LHS and RHS Parse list entries since they have served their purpose. 
=> Keep the col^info (for column B} in the RHS parse list entry, since it was the same as the LHS 

col^info. This infOTnation is really not relevant for predicates being Ai^Ded together. However, 

in the funne if it is necessary to keep predicates on separate columns in separate clusters this 

information can be used for ANDs as we!L 
=> Send the RHS Parse list back to node 3, 

At node 3 we note that we have lists from the LHS and RHS that have single entries. Their col^info 
addresses don't match - one is for column C and the other for column B. We can stiU chain these 
predicates together since we are at an AND node. So we do the following; 

=> Assign the RHS cluster address to cluster^addr in the LHS parse list entry. 

=5. Assign this address to clustcr^addr in predicate-info for the LHS predicate and clear pred'^info. 

=> Assign NITLLP to the RHS col^info since the col'^info returned from the L 
different (LHS refers to column C and RHS refers to column B). 

=> Return to node 2 with the RHS parse list consisting of a single entry. 
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We go down the RHS of node 2 passing down a nuU GEM parse list. We perform operaiions similar to 
ihe above for this leg of the parse tree: 

=> Assign a new cluster address lo all three predicates pointing to a new GEM cluster entry. 
Set opt'^d^:Iusier lo 0 for OR. 

Return to node 2 widi a single entry in the parse Use The col'^info will point lo coiumn A. 
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Now wc arc back at node 2, Until now wc have had GEM parse lists temmed to us with single entries 
from the LHS and RHS, These lists have either had no cluster addresses assigned or one assigned only for 
the RHS list. At this node we have parse lists remmcd with a single entry from both the LHS and the 
RHS and they are not for the same cluster. So we really cannot merge the LHS with the RHS. All we can 
do a this point is chain the LHS parse list-entry to the RHS parse list entry and pass it back to node 1. 
We do that by pointing the nexf^dr field in the last entry of the RHS parse list to the first entry in the 
LHS parse iisL Wc pass back the RHS parse lisL It will have NULLP in col^info for the two entries in 
the list since they have no relevance beyond this point. However, the (Hily thing thai can indicate that they 
don't have celevance is the feet that the LHS was a set of ANDs and the RHS was asct of ORs. This is 
indicated in their corresponding GEM Quster entries. 
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We now go down die RHS of node L We will eventually end up at node 7 again after having pnxressed its 
40 LHS. We would have assigned A =4 andC = 5 to anew AND cluster. What we will get back from the 

LHS for this node is the address of this new cluster in a single entry GEM parse list. Col'^info would be 
set to NULLP, since the col^o returned atnode 8 would be for cohimns A and C (different columns). 
Pred^info 'will be NULLP, since it would have been used to assign the cluster address to the predicate-info 
entries for the two predicates. 



At node 9 we will get back from its LHS another parse list with an entry. It will have the address to 
column C in coi'^info. C = 5 and C = 10 will point to a new GEM cluster entry. This entry will indicate 
that they are related to each other by an OR, 



so 
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At node 1 1 we wiU get back col^info addresses that differ (for colunms B and A). We caiuioc chain these 
together since they are ORed, We do not have any clusters created at this point - both the LHS and RHS 
lists returned entries with no cluster address. We will do the following: 

=> Create two new cluster entries. Assign B = 5 to one of these and A = 2 Co the other. 

=> Set opf^and^lusier lo OR since there is a single predicate in the clusters which can be ORed lo 

predicates at higher levels, 
=* Chain the LHS parse list entry to the RHS via nexi'^addr o f the RHS entry. 
Return the RHS parse list to node 9 with col-^info for both entries intact. 

Let me bring your attention back to this last statemenL You noticed that at node 2 which was an OR we 
passed NtJLLF back in col'^info. We don't do that here for the following reason. Say we had a WHERE 
clawsethatsaidB INa.5)ORA = 2, Esscndally we wiU end up with parse nodes at the end that looked 
similar to node 1 1 and down, tfowever. there would be another OR node preceding it with its LHS having 
B = 1. EveiithoughB = 5andA = 2cannotbecha5ncdtogcthcrB - 1 and B - 5 can be chained in an 
OR relationship. Thai is the reason we are passing the coi'^info back with their respeciive parse list 
entries. At each OR node we have to be aware of passing this information back. 

From the same token at each OR node we have to be aware of looking for sucii cases so that appropriate 
predicates can be clustered together. Our example does not have such a case. AND nodes do not have the 
same issue. At each OR node, with muldple parse list entries from the RHS. we will have to compare the 
col^info from the LHS to the col^info from all RHS entries. If there is a matching entry in the RHS list 
and if the cluster entry foe it indicates an OBt* we can add the LHS entry to the same cluster. The 
cluster^addr for the LHS predicate will be assigned the same cluster address. 
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So we now make it back to node 9 with the GEM parse Use from the LHS with a single entry and one from 
the RHS with 2 entries. It would seem that we would have to perform a cross product between these two 
lists since we are at an AND node. So for each entry &om the RHS parse list we wiU have a pointer to the 
first entry of the LHS list (or the only entry in this case). We have called this pointer in the GEM Parse 
list and^o^ist for the list that it will be ANDed to. We will set the col'^info fields for all three entries to 
NULL, since ihey have no relevance above this node (we arc at an AND node). 

At node 7 we have again a list with one entry from the LHS and one with two entries from the RHS 
(though each of them point to yet another entry that they are ANDed to). Since we arc at an OR node we 
will just change the nexf^addr for the last entry in the RHS list to point to the LHS list eniry. We will set 
col'^info for the LHS list entry to NULL. Even though we are at an OR node the cluster from die LHS is 
an AND cluster. 

We are now back to node 1, We have a list with two entries from the LHS and 3 from the right. We have 
to perform a cross product agaiiu We will go through the parse list from the RHS and point each entry to 
the list on the LHS. As we go through the RHS list we see that the fust entry already points to another 
lisL So we go to the other list and make U point to the LHS list using the and'^to^list field to do so. When 
we proceed to the next entry in the list we "see that it is pointing to the same list as the first so we are 
already set for thaL For the third entry in the list we again assign the address of the first entry in the LHS 
list to and^to^iist since it is currendy null. We maintain the cross product infonnation in this fashion 
instead of completely materializing it at this time to reduce redundancy. We are finally done with, the 
processing of the parse tree. 

There may be cases when the top-most node in the parse tree is an AND and it gets a list with only one 
cluster entzy from die LHS. If there are muldple entries retioned from the RHS^ essendally the cltister 
returned from the LHS is part of evety disjunct that will be generated. In such cases we can avoid adding 
any disjuncts for that cluster to save space. For this Quster entry the flag opt^luster^n^ll^isjuncts will 
be set to indicate that the cluster partidpaies in all disjuncts. 




Generating the Disjuncts: 

So now that we have gone through the parse tree and assigned predicates to clusters, we need to form the 
disjuncts and assign the clusters to those disjtmcts. To do that we will use the GEM parse list that resulted 
from our trip through the parse tree. We will treat this list as made up of three parse lists ANDed 
together. The first parse list has clusters 6, 5, and 3 as entries. The second has cluster 4. The third has 
entries for clusters 2' and 1. 

Lets start at the first entry in the first parse list — for cluster 6. This entry points to the second parse list 
via its and^o^ist field. We go to the second parse lisL Its entry for cluster 4 points to the third parse list. 
So wc get to the third parse list. Its first entry for cluster 2 docs not point to any parse lisL So at this 
point we create our first disjimcL Wc start with 0. Since the entry is for cluster 2 we go to cluster 2 and 
point its firsc^sjuna and last^isjunct to a new disjunct cnay with a disjunct^d of 1. Now that we arc 
done with that entry, we proceed to the next entry in the third parse list which is for cluster 1. Again, it 
does not point to another parse UsL So we create another disjimcc This will have a disjunct^d of I. So 
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Optimizer 



we 20 to cltister I and point it to this disjunct. Done with the Ourd parse list we come back to the cluster 4 
Tti?^^ the s«ond panelist. We know that this cluster would have to belong lo aU the d.qunoe we just 
r^Jm!Si nSo we hang off disjunct entries 0 and 1 ftom cluster 4. Cluster 4-s/rrjr^«9««cr 

do the same for cluster 6 which gets the same two disjuncts 0 and I. 

We now proceed to the second entry in the first parse lisL TUs is for cluster 5 and takes " 
wt«rtot^»ny for cluster2lnfte third parse list. since.lhelastdisjunot cieaurf was I this ume we 

i Vi?^7S.^^72 We^d disiiaicta to die list of disjuncts for cluster 2. Its/-wr\iiflttBcr points to 
^u:^otS« mS« 2%e d^^ 

dL unct - with a disi^na-id of 3. Now cluster I will point to dispmcts I and 3. Commg teck to the 
Sr 4 en^ in the second parse list we know that the disjuncts we pist created were 2 through 3. So we 
idl^'XSo^er"^ list of disjuncts. Comm^ 
we create entries for <Jisjimcts 2 and 3 for « as well. 

eCf-^«ri^s'r.^:^icT^^ 

fust parse list, we create disjuncts 4 through 5 for it 

If for a clusterenuy the flag opl^rfim<:/^n^J^/fl"n^« is set we wiU not physicaUy create th* disjunct 
"5for~J^imu:t.how'ever.wiUtogicaUybec««^ 

The foUowing diagram shows this perspective. 7bc cluster entries seem to be duplicated ih^ i^Y 
a^.^ SiniweaTh^aiUfiomacol^^ 

^t«Tdia«entcol«,L ate pointmg to the same cluster enay and bel^^^ 

Clusters 1 and 3 arc repealed for iUustiation. 



Predicate-info entries accessed via 
CoIumn*lnfo 



Clusters DisjurKts 



J— r" 



1 



3^ ' 





... 

C- 10 









— g— 
C - S 
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So, for each colunui we can start from the column-info entry and go through opc^pred^ol^tnict for ail 
predicates for that column. The opt'^pred'HioI'^truct will point to the predicate -info entry for the 
predicate. This predicais-info entry w.LU point to the GEM cluster enoy that it belongs to. The cluster 
entry will point to the disjiincis it participates ij:t If we did that using the previous diagram as a guide we 
would get (he following matrix of columns by disjuncis - the objective ttiat we had when we started out: 



COLUMNS 



Disjunct 


A 


B 


C 


0 


(=2of = 4or=5)& = 2 




= 5or= 10 


1 


= 2 


>= 5 & <=10 


= 10&('=5or:=10) 


2 


= 2or = 4 or=5 


= 5 


= 5or=I0 


3 




C>=S&<=:10)&=5 


= lO&(=5or=:IO) 


4 


C=2or = 4or-5>& = 4 




= 5 


5 


= 4 


>=5&<=10 


= 10&=5 



Creating the Executor Structures: 

The structures generated for the Executor Stan with the column (each key column of the selected index). 
Each key value point to a disjunct list (from 0 to 5 in this case). Each of those disjuncts points to the 
clusters that participate in that disjunct Each cluster then points to the list of predicates that apply to thai 
column and disjunct. The only difference between the clusters created for the optimizer and those for the 
executor are that the ones for the executor will not AND predicates on two separate columns. An executor 
cluster restricts itself to a single columiL 

This is illustrated by breaking cluster 1 into IB and IC and cluster 3 into 3A and 3C. The cluster entries 
have been repeated on the right hand side for iUustcaticn only. Acnially the clusters oa the left will 
direcdy point to the first predicate in the chister. Disjuncts potndng to no clusters will have a flag 
indicating missing predicates. 
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[33-^ 



3-^ 



Clusters Predicates 



10 

e. to 



23 



EP 0 747 839 A1 



GENERAL EXPRESS lON METHOD (GEM) 
Innemal Design Specification 



Rohit Jain 
Harry Leslie 
Hedieh Yaghmai 
Dave Birdsall 



Version: 1.1 - 2/9/95 



Print date : Apr 13, 1995 (10:33:05) 
Source ■ : $SQIiDOC.GEMDOC.IS on \TESS 

Copyright (c) 1995, by Tandem Computers Inc , 



APPENDIX B 



24 



EP0 747 839 A1 



General Expression Method 

1 , 0 INTRODUCTIOM- 

This docxarnent outlines the internal design and architecture for 
implementing a new access method in NonStop SQL to augment range and 
list processing. This new access method will result in performance 
gains when dealing with large databases. It will allow SQL to use 
indexed access when complicated expressions are specified in a query. 

There is a customer commitment to make, this enhancement available by 
April of 1995- 



1.1 Documenfc History 

Version 1,0 December 10, 1994 Draft version. 

Version 1.1 February 9, 1995 Ready for wider review. 



1,2 Relate.d Doctimenfcg 

[1] "Begin/End Index Access with Embedded Ranges, Harry Leslie, v^" 
(11/11/93) . 

[21 "A New Access Method for Decision Support Queries , " Surtil ,Sha.rma ^ 
and Hedieh Yaghmai, (4/4/94). 

[3] "Begin/End Key with Embedded Ranges and In Lists, 

General Expression Acess Method (GEM)," Dave Birdsall, 
Rohit.Jain, Harry Leslie, Hedieh Yaghmai, (12/5/94) . 

[4] "General Expression Method (GEM) External Specification,- 

Rohit crain, Harry Leslie, Hedieh Yaghmai, (12/15/94) . 

[5] "Begin/End Key with Embedded Ranges and In Lists," Harry Leslie, 
Sunil Sharma, Hedieh Yaghmai, (4/8/94). 



2.0 ST3MMARY OF EXTEKNALS 

The general ' expression access method (GSM) will allow users to 
specify queries with arbitrarily complex expressions. The method 
will normally access only those records which are needed and no more. 

The SQL compiler will heed to determine the number and cost of. 
necessary accesses.' 

Executor aigorithms to support GEM merge expressions at runtime to 
eliminate duplicates. Fetch processing calls the GEM algorithirts to 
obtain subsets for scanning. When all subsets have been traversed, 
fetch processing returns EOF. 
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A define -ivill be available to the users to control when che new 
access method is used. These defines are: 

* =_sql__cmp_use__gem 

* =_sql_cmp__no_gem 



3,0 SUMMAJRY OF INTERNAL DESIGN 

We describe the internal design for run- time and for compile time. 



3.1 Design Overviev 

At compilation' time a series of new key expressions will be generated 
for the executor. These expressions will describe a mulcx- column 
key, and the range and IN list predicates on those colximns. The 
generated expressions are evaluated at execution time by new key 
building routines to create a data structure which we wiXX ^-efer 
to as a GEM- tree. The GEM- tree will contain infonnation about Che 
key columns, describing ranges and exact values, state and context 
information, predicates defined on the coliimn, comparison operators, 
■etc. ■ - - ■■ ■ 

Building the GEM- tree consists of combining ranges and eliminating 
duplicates on the key columns at runtime. After the GEM- tree has 
been prepared^ values are retrieved from it to build keys for reading 
data from the table. Probes into the table will be on. exact 
key values and no ranges are specified, other than on terminating 
columns. Gaps in values of columns will be skipped; that is no keys 
will be built for values which do not exist in the range. 



3^2 Design ObjeGtivea 

This project's main objective is to improve the performance of 
queries against large databases. The new access metliod will 
support browse and non-browse accesses and will provide as much 
concurrency as possible. It will not introduce any regressions where 
existing queries can no longer run, or have a degradation in their 
performance. 



3>3 Run Time Proceaalng 

Run time processing consists of two phases: GEM- tree construction 
and GEM- tree traversal, 

GEM- tree construction takes a predicate list from tlie RTDU, 
representing the key expression in cluster disjunctive normal form 
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{defined below) , instantiates outer values, and transforms the result 
into a GEM- tree. Because we must have instantiated outer values, 
GEM- tree construction must take place at cursor open time. 

GEM- tree traversal traverses the GEM- tree {no ' surprise there) , 
generating key values for table access. The set of key values 
generated is no more than those described by the key predicates. We 
may generate less, if we know that certain key values do not exist in 
the base table. GEM- tree traversal takes place at cursor fetch time. 
A fetch may cause zero or more key values to be generated. 



3.3.1 GEM- tree Construction 

Conceptually, GEM- tree construction consists of two steps. We first 
build a GEM- tree for a single disjunct (defined precisely later) . 
We then merge these diajuncts together. This is done for each key 
column independently* 

In pseudo-code, 

For each disjunct 

For each column, from rightmost key to leftmost (*) 
Build a GEM- tree for that column and that disjunct 
If it is empty 

Exit inner for loop, advancing to next disjunct 
Else 

Merge it into the result GEM- tree (Stage 2} 
End if 
End for 
End for 

The logic at (*) and (**) deserves some explanation. If a GSM- tree 
for a column for a given disjunct is empty, that means that 
the predicates for that coltmm contradict one another. That is, 
no values will satisfy the column predicates for that disjunct . 
Therefore, we can discard the disjunct as a whole. 

We work from rightmost key to leftmost at (**) so that there will be 
no probes on that disjunct. If we work from leftmost to rightmost, 
there may be GEM- tree entries for the first column for tbat disjunct; 
We may do I/O's to find values for that coluxnn. Then we traverse 
to the next column axid discover there aren't any values for that 
disjunct. We would have done unnecessary I/O. 

On the other hand, working right to left, we may get GEM- tree entries 
in columns to the right that we never traverse to. This does no 
harm (other than a bit more CPU during traversal perhaps) ; it costs 
no I/O's. This will become moire clear once the GEM- tree traversal 
algorithm is explained below. 

If, in some disjunct, there are no predicates on some key column, it 
simply means that all possible values in those coliomns are selected. 

Before describing the stages in detail, we give some definitions. 



(Stage 13 
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3.3.1.1 Definitions - GEM- tree, -Predicate List. 

An atom is a predicate of the form <column> <op> <value>, where 
<value> is the result of some expression evaluated at cursor open 
time (or before), and <op> is =. , 

ccoluinn> IS NULL. (Note in particular that <> is excluded, and IS 
NOT ISTULL is excluded.) 

A cluster is a disjunction of atoms all having the same <coluinii>. An 
atom by itself is also considered a cluster. 

A disiunct is a maximal conjunction of clusters. A cluster by 
itself is a disjunct, if it is not ANDed with any other. Note 
that this definition is more restrictive than the usual macUematrcal 
definition. {In mathematics, a disjunct is any operand of an OR.) 

A disiunct nximber is a" number identifying a disjunct, Disjunccs are 
. numbered sequentially, starting at 0. 

A oredicate is in cluster disiunct normal form if it is a disjunction 
of'disjuncts. This differs from the usual definition of disjunctive 
normal form in that we allow clusters as operands of AND nodes 
instead of restricting them to just atoms - 

A predicate list is a physical representation of predicates on 
a particular column across all disjuncts, for a predicate in 
cluster disjunct normal form. (The predicate as a whole, then, xs 
represented by a sequence of these lists, one list per key column.) 
The Generator creates such lists. A predicate list consists of 
entries of the following form, sorted in disjunct number order. 

<disjunct number> <op> <value> <OR flag> 

Each entry represents an atom of the form <column> <op> <value> (the 
<column> is omitted 'in the physical representation, since i.t xs the 
same for all predicates in the list) . 

The <0R flag> is on if this atom and the next one belong to the 
same cluster. (Note that this would imply that they are in the same 
disjunct also.) 

An interval is a range of values. It may be closed (in which case 
it includes its endpoints) , open (in which case it excludes its 
endpoints) or semi-open (in which case it includes exactly one of its 
endpoints) . We denote an interval as follows: 

Ca,b) - the range' of values x satisfying a < x < b 

[a,b] - the range of values x satisfying a <= x <= b 

(a,bl - the range of values x satisfying a < x <= b 

[a,b) - the range of values x satisfying a x < b 

Note that this definition is the same as the usual mathematical 
definition of the term "interval" - 



28 



BP 0 747 839 A1 



General Expression Method 

The maximum possible value is denoted as hi; the minimum possible 
value as lo. So, the interval containing all possible values for a 
column is denoted [lo,hi] . 

If a column is nullable, we consider hi to be the NULL value. That 
is, NULL is added to the set of possible values, after the highest 
non-null value. So, for our purposes here, NULL > all non-null 
values, and we consider NULL ~ NULL, 

This is- different than the SQL sense of NULLs, but identical to 
how FS2 and DP2 treat key ranges. It should be clear from context 
whether comparison is meant in key order sense or in SQL sense: when 
we discuss predicates, we mean SQL sense; when we discuss intervals 
and values in an interval, we mean key order sense. 

In some cases, we will need to refer to the maximum non-null value. 
We refer to this as non-null-hi . 

A reference set is a set of disjunct niombers. 

A GEM-interval is a set of the form .{ I, R } where I is an interval 
and R is a reference set. It is intended to represent a range of 
values, and relate that back to disjuncts. that that range of values 
satisfies. 

A GEM- tree on column A is a sequence of GEM- intervals, having the 
property that the intervals are disjoint, and the sequeiTce is ^ordered 
by beginning endpoint of the intervals (using the convention that 
[a, a] precedes (a,b]). 

A GEM- tree on coluions A, B, C. N is a sequence G(A) , G(B) , . . . , 

G(N) where G(x) is a GEM- tree on colinnn x. 

In much of the following discussion, we will simply u.se the term 
GEM- tree ; it should be clear from the context which of these 
definitions we mean. 

3.3.1.2 Stage 1. • ' 

In stage 1, we build a GEM- tree for a single column, X, for a 
single disjunct with disjunct number n. In the. process, we eliminate 
redundancies among the predicates- 

The input to this process is a sequence of predicate lists 
representing a predicate in cluster disjunctive normal form. ^Each 
list represents the predicates on a particular key column. Within 
the list, oredicates are ordered by disjunct number. So, we^ will 
consume all predicates for a particular coluitm and a particular 
disjunct each time we execute stage 1 processing. 

In the algorithm below, G and Gtemp are GEM- trees on column X. GI 
represents a GEM- interval . While the algorithm executes, they hold 
intermediate results- On completion, G has the desired GEM- tree. 
The notation (n) denotes logic that is expanded in further detail 
later. 
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G = <empt:y seo 

If there are no predicates on this column 

G = { [Lo.hi] . {n} } 
Else 

For each atom on column X 

Reoresent the atom as an interval I 
Gl'= { I. {n} } 
Gtemp = { GI } 
While the OR flag is on 

Advance to the next atom 

Represent it as an interval I 

Gtemp = { I, {n} } ^ r ^-^ i 

Gtemp = Gtemp GEM- tree- union- 1 {. Gl ) 
End while 

G = G GEM- tree- intersection Gtemp 
End for 
End if 

(1) This case can occur, 'for example, in a disjunct chat has 
predicates on the first and third key columns but not the second. 

(2) Representing an atom as an interval is described below. Recall 
that an atom is a predicate of the form <column> <op> <value>, where 
<op> is limited to or >, or it is <colunin> IS NULL. To 
make the pseudo-code easier to read, we represent <:column> as X, and 
< value > as v» "" " "" 

Case 

Atom is X < v 

I = Clo,v) 
Atom is X <= V 

I = [lo,v] 
Atom is X = v 

I = [v,v] 
Atom is X >= v 

I = [v, non-null-hi] 
Atom is X > V 

I = (v, non-null-hi] 
Atom is X IS NUIiL 

I = Etii.hi] (here, X is nullable, and hi is NULL) 
End case 

(3) Note that since a cluster is an OR of atoms on the same column, 
it will never be the case that the OR flag will be on and the next 
atom is for a different column. 

To describe GEM- tree -union -1 and GEM- tree- intersection, denote the 
first GEM- tree as Gl, the second as G2 . G denotes a GEM-tree which 
during the algorithm holds an intermediate result. At tentiination, 
G holds the result GEM- tree. Denote the GEM-'intervals of G as { II, 
{n} ), { Ik, {n} }, and the GEM- intervals of G2 as { Jl, {n} 

}, { Jm, {n} I denotes an interval, holding an intermediate 

result . 

(4) GEM- tree-union- 1 is described below: 



(1) 

E2) 

(3) 
(2) 
(4) 
(5) 
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G = Gl 

For j = 1 , . . . , m 

Find the first interval Imin that overlaps Jj (4a) 
If such an interval exists 

Find the last interval Imax that overlaps (4a) 
Create a new interval I representing the union of Imin, 

Imax and Jj . r i i • ^ • i. r ^ ^l^i i 

Replace { Imin, {n} } { Irnax, {n} } xn G with { t. {n} } 

Else • . ^ , , 

insert { Jj , {n} } into G in the appropriate order (4c) 

End if 
End for 



overlaps 
. . . , max- 



(4a) Note that since the intervals are in order, i-f Irain 
Jj, and Imax overlaps J j , then li overlaps Jj for i = min. 
Further, if Imin is the smallest such and Imax the largest such, then 
no other intervals overlap Jj . Also, if min < k < max, XJc is in fact 
a subset of Jj . 

Determining whether two intervals overlap is done in the way one 
would expect. We show below tJtie logic for two closed intervals. 
Denote the intervals as I = [a,b] and J ^ [c,d]. Comparison is meant 
in the key order sense. The generalization to all intervals should 
be straightforward. 



lies before lc,d]) 
lies before [a.b] ) 



If b < -c (that is, if Ca,b] 

The intervals do not overlap 
Else if d < a {that is, if [c^d] 

The intervals do not overlap 
Else 

The intervals overlap 
End if 

{4b} Denote Imin as la,b] , Imax as [c,d] and Jj as [x,yl . Then the 
the union of Imin, imax and Jj is [min(a,x) ,max (d,y) ] , Min 

and max are computed in key order sense. The generalization to all 
intervals, agaiii, should be straightforward. 

(4c) Note that as "we iterate through Ik to find Imin, we stop when 
we find an interval that overlaps, or when we find an interval Ca,b3 
that lies after J j , In this case, we have stopped at precisely the 
proper point to insert Jj'. 



(5) GEM- tree- intersection is described below: 



G ^ <empty set> 
For i = 1 , . . . , k 

For j = 1, m and while I is non-empty 

If li overlaps Jj 

Add the GEM- interval { li intersect Jj , {n} 
to the end' of G 
End if 
End for 
End for 
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(*) Note tliat since Gl and G2 have their 
result does also if we append to the end. 

3.3.1.3 Stage 2. 



General Eocpression Method 
intervals in order, the 



In stage 1, we built a GEM- tree on a ^ 
disjunct n. In stage 2, we glue these GEM-trees together. We 
step-wise: We take a GEM-tree for disjunct n and glue rt 
GEM- tree for disjuncts 0, . 



column X for a particular 
do it 
to the 



n-l. 



The -stage 2 algorithm is simply a union of two GEM- trees. We 
call this operation GEM- tree -union- 2 . It is somewhat different than 
(and in fact is a generalization of) GEM- tree -union- 1. The added 
complication is that the reference sets of the GEM- intervals are no 
longer identical . 

Since we wish to remember which key values are^ generated by each 
disjunct, we treat overlapping GEM- intervals differently. We break 
out their intersections into new intervals, whose reference sets are 
the union of the originals. 

As before Gl and G2 denote the GEM-tree operands, G denotes a 
GEM-tree intermediate result. At termination, G holds the desired 
GEM-tree ( li. Ri } is tlie ith GEM-interval of G, while { JJ, R:] 
} is the 'jth GEM-interval of G2. Ix, ly. Iz, and J represent 

interval intermediate results. 

GEM- tree -union -2 is described below: 
G = Gl 

For j . = 1, . . . , m 

i = 1 
J = jj 

While i <= k and li lies before J 

i = i+1 
End while 

While J is non-empty and i <= k and li overlaps J 



IX = li "front-" J 
Jx = J "front-" ri 
ly = li intersection 
Iz = li "back-" J 
J = J "back-" li 
Replace GEM-interval 
from the following: 

Lf Ix is non-empty 
if Jx is non-empty 



(i.e. the part of li before J, if any) (1) 

(i.e. the part of J before li, if any) 
J 

(i.e. the part of li after J, if any) (2) 

(i.e. the part of J after Xi, if any) 

{ li, Ri } with the sequence assembled 



f Ix, Ri ) U 
I Jx, Rj } ii 



(3) 



( ly, Ri union Rj } 
{ Iz/ Ri } if Iz is 
i = i + 1 
End while 
If J is non-empty 

Add GEM-interval { J, 
End if 
End for 



(Xy is always non-empty) 
non-empty 



Rj } to G after I(i-l) 
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(13 li "front-" J denotes that part of li that precedes O". If Xi 
is [a,b] and J is [c,d] , it is computed as shown below. (The 
computation assumes that I and J -are already known to overlap. This 
assximption shows .up in the line C*).) The generalization to other 
kinds of intervals should be obvious. 

If a < c 

Return [a,c) C*) 
Else 

Return the empty set 

(2) li "back-" J denotes that part of Xi that follows J*. If Xi 
is [a,b] and J is [c,d], it is computed as shown below. (The 
computation assumes that I and J are already known to overlap. This 
assumption shows up in the line (*).) The generalization to other 
kinds of intervals should be obvious. 

If d < b 

Return (d,b] <*) 
Else 

Return the empty set 

(3) Note that at most one of Ix and Jx will be non-empty. 'The 
sequence assembled consists of at most three GEM- intervals - 



Note that GEM- tree -union- 1 and GEM- tree-union- 2 could be implemented 
by a single algorithm, if we added logic here to deal with the case 
where Ri = Rj . In Stage 1, Ri Rj always (and in fact is a 

singleton, {n}) . In Stage 2, Ri <> Rj always. To unify the two 
algorithms, we would change GEM- tree -union- 2 to not break out the 
intersection of li and Jj into a separate interval if Ri = Rj , but 
instead simply glue li into Jj - . 

But, it is a little more CPU-ef f icient if we implement these as 
separate procedures (and avoid computing whether Ri = Rj) . 

3 . 3 . 1 1 4 An Example . 

To help fix ideas, let us go through an example. Consider the 
following predicate on non-nullable key columns A, B, C: 

(A between 1 and 100 AND 

B W 10 AND 

C < 30) 
OR 

(A between SS and 120 AND 
B between 10 and 20 AITO 

C >= 30) ... 
OR 

(A in (100,200) AND 
B >= 10) 

1. First, we write this as a sequence of predicate lists. We number 
the disjuncts: 
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0 -->• (A becween 1 and XOO AND B = 10 AND C < 30) 

1 (A becween 65 and 120 AND B between 10 and 20 AND C >= 30} 

2 --> {A in (100,200) AND B >= 10) 



We get the predicate lists: 

Column A Column B 



0, >=, 1 



0, 10 

1, >=, 10 

1, 20 

2, 10 



Coluinn C 

0, <. 30 

1, 30 



0, <=, 100 

1, 65 

1, 120 

2, 100, OR-flag 
2, =, 200 

Notice we have replaced the 'betweens' and 'ins' wich atoms. 
Notice also that- in the third predicate list: there is no 
predicate for disjunct' 2. 

We next build GEM- trees for disjunct 0. We start with column C. 
There is only one atom, C < 30, for disjunct 0, gxvmg rise to 
the interval [lo,30). The resulting GEM-tree is: 



ColnTnn C ; 
Interval 
[lo,30) 



Ref erence- Set 
{ 0 } 



Similarly, for column B, we get: 



Colmaix Bi 

Interval Reference set 

[10,10] { 0 } 

The atoms on A in disjunct 0 give rise to the intervals 
[l,hi] and [lo,10Ql- Since the atoms are ANDed, we use 
GEM- tree- intersection an them to obtain [1,100], The resulting 
GEM-tree on A for disjunct 0 is simply: 



Colimrn Ai 

Interval Reference Set 

[1,1001 { 0 } 

Next we consider disjunct i. We first build the GEM-tree for 
column C, obtaining 

Column Cj 



Interval Reference Set 

[30, hi] { 1 y 

Next, seeing -that this tree is non-empty, we do stage : 
processing to merge this tree into our result. 

The GEM- intervals are { [lo,30), { 0 } } and { [30,hi] , { 1 } } 
These do not overlap, so the resulting GEM-tree is simply 
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Gol-uim C: 
Interval 
[lo,30) 
[30, hi] 



Reference Set 



We continue on^ doing stage 1 processing for disjunct 1 on column 
B. We get: 



Coltaimi Br 
Interval 
[10,20] 



Reference Set 
( 1 } 



We then do stage 2 processing, merging it with disjunct 0, We 
have the GEM- intervals { [10,10], { 0 } ) and { [10,20], { 1 } 
}. These overlap, in fact [10, XO] is - contained in [10,20], We 
brea)c out the intersection of these two into a separate interval, 
with its reference set being the union of the originals. The 
resulting GEM- tree is 



Colnffln Bi 
Interval 
[10, IT)] 
(10,20] 



Reference Set 



Notice that the interval [10,20] changed to (10,20]; we took; out 
the part that overlaps [10,10] . - 

We move on, to do stage 1 and then stage 2 processing for 
disjunct 1 on coluann A. Stage 1 processing yields: 



Coliiam As 
Interval 
[65.1203 



Reference Set 

{ 1 } 



In stage 2. we have the GEM- intervals { [1,100], { 0 } } and { 

[65,120], { 1 } }. Again these overlap, but unlike with column B, 

one* is not- contained in the other. So, this time tiie result is 
three GSM- intervals: 



Coluagi A; 

Interval 
[1,65) 
[65,100] 
(100,1201 



Reference Set 



Next, we process disjunct 2. There are two new tvists here. 
Column C has no predicates, so we generate a GEM- interval 
containing [lo,hi] , Column A has a cluster consisting of two 
•atoms. This shows GEM- tree -union- 1 processing. A = 100 gives 
rise CO the interval [100,100], while A - 200 gives rise to 
[200,200] - They do not overlap. So, both result in entries m 
the GEM- tree. The GEM- tree (for all columns) is as follows: 
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Coli-imn C; 
Interval 
[lo,hi] 



Reference Set 
{ 2 } 



ColUTTITI B : 

Interval 
[lO.hil 



Reference Set 
{ 2 } 



CoLmriTi A; 

Interval 
[100, 100] 
[200,200] 



Reference Set 



We merge them in stage 2 processing as before, doing C first 
A last. Finally we obtain: 



and 



Colimn A; 

Interval 
[1,65) 
[65,100) 
[100, 100] 
(100, 120] 
[200,200] 



Reference Set 

0, 1 } 
0,1, 2 } 



ColnTTin Si 
Interval 
[10,10] 
CIO, 20] 
(20, Hi] 



Reference Set 
{ 0, 1, 2 } 
{ 1. 2 } 
{ 2 } 



Colunai C: 
Interval 
[lo,30) 
[3Q,lii] 



Reference Set 



3.3,2 GEM- tree Traversal 

During GEM- tree traversal, we generate the key values that satisfy 
the original predicate. We may enumerate all possil>le .values^ (the 
"dense" algoirithm) , or we may s3cip values ]cnown not to exxst xn tne 
base table (the "sparse" algoritiim) . 

First, we describe the GEM- tree traversal algorithm in general, then 
we describe its "dense" and "sparse" variants- 

3,3.2-1 The GEM- tree Traversal Algorithm, 

We give two descriptions of the algorithm. The first is a conceptual 
one: It shows how we iterate over the set of key values. We don t 
implement this algorithm directly, since we want to interleave this 
iteration with fetch processing in general. The second descrxptxon 
reformulates the algorithm in a way that admits this interleaving. 
The second description is what is actually implemented. 
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Concepcually , we can express GEM- tree traversal as a recursive 
procedure. Below, G is a GEM- tree on n columns (on each, recursive 
call, n is effectively decremented) . R is a reference set. (an 
accumulation of context that tells us which disjuncts the partial key 
value satisfies) - .PK is a partial )cey; it is a selection of key 
values from higher level invokations of the procedure. 

Traverse (G, R, PK) : 

For each GEM- interval { li^ Ri } of the first column of G 

Rx = R intersect Ri (1) 
If Rx is non-empty 

If there are more columns in G 

For each value V in the interval li (2) 
Traverse CG minus its first column, Rx, PK | I V) (3) 
End for 
Else 

Denote li as Ca,b] (4) 
Emit the begin key value PK | | a 
Emit the end key value PK [ | b 
End if 
End if - 
End for 

The argijiments to Traverse are a GEM- tree G, a reference set and- 
a partial key, PK. In the mainline code, we call Traverse with 
the arguments (G, <empty set>, <eir^ty key>) , where G is -our original 
GEM- tree . 

(1) By taking the intersection, we find values of the current key 
column that might appear with values of prior columns. If we don't 
carry this context along, we may generate key values that do not 
satisfy the original predicate- 

(2) This iterator may be the "dense" or "sparse'' algorithm. The 
dense algorithm enumerates all possible values in the interval, while 
the sparse algorithm may skip over values that don't in fact exist 
in the base table. We may use dense on some columns and sparse on 
others, independently. 

(3) The recursive call in effect traverses to the next column of the 
GEM- tree . 

(4) The algorithm shows treatment for a closed interval. If it 
is open, we set the after flag for the- begin key and the before 
flag for the end key. The generalization to semi -open intervals is 
straightforward. 

Next, we recast this algorithm in a way that allows it to interleave 
with fetch execution. We unwind the recursion by associating a 
context with each coltmm. The information in this context contains: 

o the current GEM- interval, if any - denoted C_Intv 
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o the current value within that interval- - i^e. the value returned 
for this columa on the last call to the iterator - denoted C_Val 

o the intersection of the reference sets of the columns to the left 
given their current GEM- intervals - denoted C_Ref 

AT so fo- the GEM- tree as a whole, we keep track of the first coluinn 
that' do^s not have a current GEM- interval . This is denoted below 
as C Col, and we represent it numeric^ly, .x.e. 1 means the first 
coluiin does not have a current GEM- tree interval, 2 means the first 
one does, but not the second, and so on. We assume the GEM- tree has 
n coium^is- We use the convention that C_Col = n^l means all columns 
have a current GEM- interval , and that C„Col .^^p. i 

finished traversing the GEM-tree (i.e. we have hit Snd-of-File) . 

(Aside: The implementation might use an array index or pointers or 
so^e other physical representation for C_Col. We don't mean to imply 
that a GEM-tree must be represented as an" array.) 

We define three methods : Traverse_init , Traverse_iie5ct , and 
Traverse_end. 

Traverse_init : 

C_Col = 1 

ColximhCl] :c intv^ =^ ^ " _ ^ _ 

Column 1 1] .C~Ref - the set of all possible diS3uncts 
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Traverse_next : 

While C_Col > 0 and we don't have a key 'value to return 
If Col\ainn[C_Col] .C_Intv = none 

Find the first GEM- interval in this column^ { I, R }, 

such that R intersect Column [C_Col ] .C_Ref is non-empty (1) 
Column CC_Col] .C_Intv = { I, R } 
If C_Col < n 

Column tC_Col] .C_Val = first value in I (2) 
End if 

C_Col ^ C_Col + 1 
If C_Col = n+X 

Set begin key C3) 
Set end key C4} 
Else 

Column [C_Col] .C_Intv = none 
End if 
Else 

If C_Col = n 

Find the next GEM- inter-val { I, R } such that R 

intersect Column {C__Coll -C_Ref is non-empty 
If one exists • 

Column lC_Col] .C__Intv = that interval 

Set begin key (3) 
Else 

■ c_coi = c_coi - 1 
Endif 
Else 

Find the next value in Column (C_Col] .C_Intv (5) 

after Column tC_Col] .C_Val 
If one exists 

Column [C^Col] .C_Val = that value 

C_Col = C__Col -t- 1 

Column [C__Col] ,C_Intv = none 
Else 

Find the next GEM- interval { I, R } such that R 

intersect Column [n] .C_Ref is non-empty 
If one exists 

Column [C_Coll .C_Intv = { I, R } 

Coli2mn[C_Coi] .C_Val « first value 'in I 

C_Col = C_Col + 1 
Else 

C_Col = C_Col - 1 (6) 
End if 
End if 
End if 
■ End if 
End Ttfhile 

Return begin/end key or EOF indication 
Trave r s e__end : 
C Col = 0 
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(1) Note chat chere will always be at least one. This is because we 
merged in the disjuncts starting at the last column and ending at the 
first. If the GEM-cree on some column for a given disj unct is empty, 
we stop merging - so that disjunct never shows up in a reference set 
on the first column. 

C2) If I = [a,b] or [a,b) , the "first value in I" is a. If I = (a,bi 
or (a.b), the "first value in I" is the next value after a. The next 
value may be determined via either the dense or sparse algorithms. 
We don't need to compute C_Val for the rightmost coluimi, because we 
will use the current interval endpoints directly in key construction 
(with appropriate after and before flag settings) . 

(3) The begin key is composed of the concatenation of the current 
values of all but the rightmost column, and the begin point of the 
current interval I for the rightmost coliimn. If I is Ca.b] or ta,b) , 
the after flag is set. Otherwise it is not set. 

(4) The end key is composed of the concatenation of the current 
values of all but the rightmost column, and the end point of the 
current interval I for the rightmost column. If I is [a,b) or (a,b) , 
the before flag is set. Otherwise it is not set. 

(5) The next value may be determined via either the dense or: sparse 
algorithms . 

(6) Note that if C_Col goes to 0, we have reached .End -of -File on the 
GEM tree. 

The begin and end keys returned by Traverse-next can be used to 
define, a subset via DhTSTART (or DM^KEYPOSITXON) , Fetch processing 
can then iterate over this subset. Once the siabset is exhausted, 
fetch processing calls Traverse-next to find the next subset. 

3.3.2-2 Dense Algorithm - 

Vfe use the dense algorithm to traverse an interval when we think the 
number of actual values in the base table is nearly the same as the 
number of possible values in the interval. 

To obtain the next current value, we simply "add one" to it. "Adding 
one" depends on the datatype and key direction. 

If the key direction is ascending, "adding one" really is in a sense 
adding one. For integer, decimal, datetime and interval datatypes, 
we simply add one to the least significant digit-; For fixed length 
character datatypes without a collation, we treat the character 
string as a whole as one large integer, and * increment* the least 
significant byte. For varying length character strings, we treat 
them a.3 if they are blank padded out to maximum length, and then 
treat them as we would fixed length character strings. 

Note that if a column is nullable, adding one to non-null-hi must 
give us NtlLL, 
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If the key direction is descending, we subtract one instead- (There 
are other considerations for descending keys as well. See the 
section, "Descending Keys", below,') 

We do not support the dense algorithm for floating point datatypes. 
We may choose not to support the dense algorithm for some other 
datatypes . 

Collations could be supported if we use the procedure 
CPRL GETNEXTINSEQUEKCE_ to "add one." This only works for ascending 
keys" however; there is presently no procedure that goes backwards- 
For now, though, collations will not be supported in the dense 
algorithm because there is no easy and efficient method today for 
the Optimizer to determine the number of distinct possible strings of 
length n given a particular collation. (Such a method is not hard to 
come by; it's simply outside che scope of this project.) 

3.3.2.3 Sparse Algorithm. 

We choose the sparse algorithm to traverse an interval when we think 
the number of actual values in the base table is small relative to 
the set of all possible values. That is, we use it when we expect 
gaps in the ranges specified for the key column, THe Optimi^r 
detects this by analyzing the UEC and row count of the table . 

We will use che sparse algorithm always for datatypes :SUGii as float 
and interval- 

To find the next value, we open a subset with begin key using the 
current value (C_Val) with the after flag set. The end key is the 
end point of the interval (with before flag set if the end point is 
open) . We then do a DM^GBT: If it completes successfully, the value 
retrieved is the next value in the interval actually present in the 
base table - 

The pseudo-code for this algorithm is as follows; 

Set begin key CD 
Set end key (23 
Issue DM^START (or DM^KEYPOSITION) 
Issue DM^GET 
If successful 

C_yal = value retrieved for this column 
End if 

Return C_Val or EOF indication 

(1) The begin key is the concatenation of the current value (C_Val) 
of the columns to the left and this column and-hi for colximns to the 
right, with che after flag set. Given that the current column is 
column c, the pseudo-code below describes this: 
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For i <= c 

Begin key[i] =^ Column [i] .C_Val 
End for 

For i > c and i <= n 

Begin keyEil = hi 
End for 

Set after flag 

R.ecall also that hi is NULL if the column is nullable. 

(2) The end key is the concatenation of the current value CC_Val) of 
the columns to the left, and the end point of the current interval. 
(We could, in fact, scan across succeeding intervals whose reference 
sets intersect C Ref in a non-empty way, but for simplicity we do not 
discuss that optimization here.) If the end point xs excluded, we 
use lo for columns to the right and set the before flag. If it is 
included, we use hi and do not set the before flag. 

Given that the current column is column c, the pseudo-code below 
describes this: 

For i < c 

End key[i] = Column [i] ,C_Val 
End for 

End key [c] - end point of Column [c] .C_Int 

If the end point of Column [c] ,C_Int is included 

For i > c and i <= n 
End key[i] = hi 

End for 

Reset before flag 
Else 

For i > c and i <= ix 

End keyli] - lo 
End for 

Set before flag 
End if 

Once again, recall that hi is NULL if the column is nullable- 

It should be observed that the dense algorithm has no X/O cost 
when traversing to the next value. The sparse algoritlnn, on the 
other hand, does involve I/O cost: We start a subset, ana do one 
retrieval. 

3.3.2.4 An Example . 

To illustrate the algorithms cibove, we give an example. Here 
we focus on the traversal algorithm itself, largely ignoring the 
question of getting the next value in a given column- In the 
appendix, we go over the example again^ with emphasis on tb.e latter 
issue. 

Recall the example given for GEM- tree construction. We have the 
original predicate 
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CA between 1 and 100 AND 

•B = 10 AND 

C < 30) 
OR 

{A between 6S and 120 AND 

B between 10 and 20 AND 

C >= 30] 
OR 

(A in (100,200) AND 
B >= 10) 

and its associated GEM- tree 

ColviTnn A; 
Interval 
Cl,65) 
[65,100) 
[100,100] 
(100,120] 
[200.200] 

ColiTnrn B;, 
Interval 
[10,10] 
{10,203 
(20, hi] 

Col"^^ Ct 
Interval 
[lo,30) 
(30, hi] 

At Traverse- init time, we set C Col - l, and Column [l] .C_Xntv = none 
and ColuimCl] .C_Ref = { 0,. 1, 2 

on the first call to Traverse -next, we see the following processing: 

1. We scan the intervals for column A looking for the fi^st 
one whose reference set has a non-empty intersection with 
Column [1] ,C Ref. Note that since ColnmnCl] .C_Ref has all 
disjuncts in it, then all intervals qualify. So, we stop at the 
£irst one, [1,65) . 

We set' Column [13 .C_lntv = [1,65) and Column [ 1 ], C_Val to 1. 

We increment C_Col to 2, and initialize Column [2 1 .C_Xntv to none.^^ 
We set Column [2] -C Ref to the intersection of Column [1] . C_Ref ana 
the reference set associated with [1,65). So, • Colximn [2] . C_Ref 
gets {O}. 

2 We next scan the intervals of column B loo]cing for the fiys^ 
one whose reference set has a non-empty intersection with 
Column [23 .C__Ref (i.e. {O}). The first (and only.) such interval 
is [10, 10] . 



Reference Set 
0. 1. 2 } 
Ir 2 t - 
2 } 



Reference Set 

[1:1] 
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We set Column [2 ] .C_Iiitv =: [10,10] and Column [2] . C_Val to 10. 

We increment C_Coi to 3, and initialize Column [3 ] -C_Intv co none. 
We set Coluinn[3] .C Ret to the intersection of Column [21 .C__Ref and 
the reference set Associated with [10,10]. So, Column [3] . C_Ref 
gets {0} . 

Finally, we scan the intervals of column C looking- for the 
first one whose reference set has a non-empty intersection with 
Column [3 1 .C__Ref (i.e. {o}). We find the interval Clo,30). 

We set Col\iinn[3] .C_Intv - [1q,30) . 

Since C is the rightmost column, we assemble the begin and end 
keys. The begin key is Cl,10;lo) and the end key is (1,10,30) 
with the before flag set. We exit Traverse^next. 

Fetch processing now retrieves all the values between keys 
(l,lO,lo) and (1,10,30), When complete, it calls Traverse_next 
again. 

In Traverse next, 'we see that C_Col is 3, and Column t3] .C_Intv 
= [lo,30). So, we look for the next interval whose reference 
set has a non-empty intersection with Column [3 ] -C_Ref (x.e. 10}). 
There aren't any more, so we backtrack, setting C_Col =2, 

in Column B, we look for the next value in Column [2] .C_Intv ( = 
[10,10]) after Column [2] .C_yal (- 10). There aren't any, so we 
are done with this interval* 

. So, we look in Column B for the next interval after [10,10] whose 
reference set has a non-empty intersection with Column [2] -C Ref 
(i.e. {0}). There is none, so again we backtrack, setting C_Col 
= 1. 

In Column A, we look for the next value in Column [11 .C_Intv ( = 
[1,65)) af ter ColvtmnCX] -C_Val (-1)- Assuming dense algorithm, 
the next value is 2, So, we set Column [1] .C_Val to 2. 

We increment C Col to 2, and initialize Column [2] .C_Intv to none. 
We set Column [23 .C_Ref to the intersection of Coliimn[l] -C Ref and 
the reference set associated with [1,65). So, Column [2] .C_Ref 
gets {O}. 

9. We continue, as before, to Column C, returning the begin and end 
key values {2,10-,lo) and (2,10,30) with before flag set- 

10- By similar processing, we generate the subsets with begin key 
(k,lO,lo) and end key Ck,10,30) with before flag set for- k = 3, 
. . . , 64. 

11. When we backtrack to column A with Column [1] .C_Val = 64, we 
see that the interval [1,65) is exhaused and move to the next 
interval, [65, 100) - 
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This time, when we set C_Col to 2, Column [2] .C_Ref gets set to 
{0,1}. 

Vfhen we get to column C, note that Column [3 ] .C_Ref gets {0,l}. 
We generate the begin key (65,10.1o) and (65,10,30) with before 
flag set. 

On the next call to Traverse^next , we find that the interval 
[30 hi] also has a reference set with non-empty intersection with 
Column [3 ] .C_Ref. So, we generate the begin key (65, 10,30) and 
end key (65, 10, hi), this time without the before flag set. 

On the next call to Traverse_next , ve don't find any more 
intervals in Column C, We backtrack to Column B. We exhausted all 
values in the current interval, [10,10], so we look for the next 
interval whose reference set has a non-empty intersection with 
Column[2].C Ref (i.e. {0,l}).. The next such interval is (10,20]. 
Since it excludes the begin point, we set Column [2 ] .C_val to the 
next value, 11. . . 

We set C Col = 3, and Column [31 . C_Ref gets the intersection of 
Column [21. C_Ref (i.e. {0,l}) and the reference set of (10, 2G] 
(i.e, {1,2}). So, Column C3 ] .C_Ref gets {l}. 

We find the first interval for Column C whose reference set has 
a non-empty intersection with {l}. The only such interval is 
C30,hi] . So, we generate the begin key (65,11,30) sCnd end key 
(65, 11, hi). 

Similarly, we generate begin/end keys (65,m,30) and (6S,m,hi) 
for m « 12, . . . , 20, 

When we exhaust the interval (10,20] in column B, we see that 
there are no more intervals for B whose reference sets have a 
non-empty intersection with Column [2] ,C„Ref (i.e. {0,l}), so we 
backtrack again to Column A. 

Continuing in this way, we generate begin/ end keys (}c,l0,loj and 
(k,l0,30) with before flag set and (k,m,30) and (k,m,h.i) for k = 
66, , . - , 99 and m = 10, ,20- 

When we next backtrack to Column. A, we advance to the intearval 
[100,100]. This time. Column [2] .C_Ref gets [0,1,2}- This will 
cause us to traverse all of the intervals for Column B. Since (2} 
is in the reference sets on all B intervals and C intervals, we 
will generate keys for all combinations of these. 

When we next backtrack to Column A, we advance to the interval 
(100,1201. Now C6liamn[2] .C_Ref gets '{l} . The begin/end keys 
generated will be (k,m,30) and (k,m,hi) for k = 101, 120 
and m - 10, 20. The Column B interval (20, hi] i-S skipped 

because l is not in its reference sec. The Column C interval 
[lo,30) is skipped because 1 is not in its reference sec. 
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20. When we next backtrack to Column A, we advance to the interval 
[200,200], Column [2] .C_Ref gets {2}. Since 2 appears in the 
reference sets of all intervals in Columns B and C, we get all 
combinations . 

21- When we backtrack to Column A again, there are no more intervals 
so C_Col is set to 0. This results in an EOF return erom 
Traverss__next . Fetch processing terminates. 

The reader may verify that the keys we generated correspond to the 
following predicate: 

(A >= 1 and < 65 AND B = 10 AND C < 30) OR 
(A >-65 and < 100 AND B = 10) OR 

(A >==6S and < 100 AND B > 10 and <=20 AND C 30) OR. 
(A = 100 AND B 10) OR 

CA > 100 and <= 120 AND B >=10 and <= 20 AND C>=^30) OR . 
(A ^ 200 AND B >= 10) 

This is just the original predicate, re-expressed in cluster 
disjunctive normal form. 

3.3.3 Descendincr Keys 



In th.e discussion above, we have mostly assumed, for sirriplicity' s 
sake, that keys are ascending. To make the algorithms work for 
descending keys, a few changes can be made. The algorithm for 
converting atoms to intervals should read: 

Case 

Atom is X < V- 

I - (v,lo] 
Atom is X <=^ V ' 

I - Cv,lo] 
Atom is X = V 

r = [v,v3 
Atom is X v 

I « [non-null - hi, vl 
Atom is X > v 

I .= [non-null-hi ,v) 
Atom is X IS NULL 

I = [lxi,hil (here, X is nullable, and hi is NIJLL) 
End case 

We have just swapped the endpoints of the intervals (being careful to 
carry the open/ closed attributes along when swapping) . 

Also, in stage 1 processing of GEM-Tree construction, if chere are no 
predicates on the column, we generate the interval Chi,lo} instead of 
[lo,hi] . 

Finally, when building begin and end keys, we supply lo instead of 
hi and hi instead of lo for columns after the subset inte3n^al column. 
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(Note: Conv"Build^Key"Buf fer does this for us if it is told the key 
direction. ) 

In other parts of the discussion, we understand comparison to mean 
key order, so the algorithms are correct as written. 



3.3.4 Proof of Correctness 

In this section, we show that the keys generated in a GEM-tree 
traversal satisfy the original predicate, and that all key values in 
the base table satisfying the original predicate are generated by the 
GEM-tree traversal - 

Theorem CThe Jain-Leslie-Yaqhmai-Birdsall alc^o rithm) : Any predicate 
P on key columns involving comparison to- outer values and IS NULL 
predicates may be renresented as a GEM-tree, GEM-tree traversal 
generates only key values that satisfy P. It generates key values 
that can be used to retrieve all rows from the base table that 
satisfy P. 

Proof : Let P be a predicate whose atoms are comparisons of key 
columns to outer values. (By "outer value" we mean a value that does 
not depend on the table being scanned.) if some atom is of the form 
X <> value, replace it with the predicate (X < value) OR CX > value) , 

Note that any predicate P can be expressed in disjimctive normal 
form. Note also that any tree in disjunctive normal form is also 
in cluster disjunctive normal form. Therefore P can be expressed 
cluster disjunctive normal form and we can use the construction given 
in the previous sections. For the remainder of this discussion, we 
assume P is so expressed. 

Construct the GEM-tree, G from P. Let Kp denote the set of key values 
in the base table that satisfy P, and let Kg denote the set of key 
values returned via GEM-tree traversal and fetch processing. We want 
to show, then, that Kp = Kg. 

First we show Kg is contained in Kp. 

Suppose during GEM-tree traversal we return begin/ end keys (kl, 
k(n-l) , bn) and (kl, k(n-l) , en). We want to show that all rows 

in this subset satisfy P. 

Recall that bn and en come from the end points of an interval I on 
the rightmost key column. Consider its reference set R. Now, by 
our construction, the intersection of R and Column [n] . C_Ref must be 
non-empty. Let j be an element of this intersection. We will show 
that the rows from the subset satisfy disjunct j of P. 

Observe that since for all i. Column [ i +•!] ,C_Ref is the intersection 
of Column [i] -C__ref and the reference set of Column(i] ,C_Intv, then j 
must be an element of all of these, for i = 1, n. 
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Denote the first key column as A. Observe that ia an element 
of CoiuinnCl] .C_Intv, the current interval o£ column A, Denote this 
interval as J. Recall that the intervals in the final GEM tree are 
subsets of some intervals in GEM trees conscructed for each disjunct. 
In particular, since j is in the reference set of J. J must be a 
subset of some interval Jj in the GEM tree built for disjunct j . 
Thus, kl is an element of J j - 

Cc?nsider how Jj was created- In general it is the intersection 
of unions of intervals derived from atoms of disjunct j. We took 
intersections £or ANDs and unions for ORs underneath these ANDs. 
Supoose Jj - intersection of Kl^ K2 , Kk. Then, kl is an element 

of "all of these. Further, each Ki is a union of Ll, Lli of 

intervals derived from atoms. Thus, kl is an element o£ some L j i 
for each Ki. But, then, kl satisfies some atom in that OR list (the 
OR list may consist of only one atom) , for all operands of the AND 
(again the AND list may consist of only one atom, or just of an OR 
lisc) - But, then, kl satisfies the predicates in disjunct j, and 
therefore the original predicace P. 

Similar reasoning shows that -k2, k(n-l), and the values from bn 

to en satisfy P. Therefore, all rows from the subset satisfy P. Thus, 
Kg is contained in Kp. 

Next we show that Kp is contained in Kg. Let (kl, kn) be the 

key of some row in the base table that satisfies P. Then, it must 
satisfy some disjtmct of P. Suppose it satisfies the jth disjunct - 

We claim that kl is in some interval in the GEM- tree for column A for 
disjunct j. Write the predicates of disjunct j on A as 

(AtomCl] [1] OR ... OR AtomCl] [jl] ) 
AND 

AND 

(Atom Cm] [1] OR . , . OR Atom[m] [jm] ) 

Since kl satisfies P, we know that for i = 1, , m there is some 

atom Atom[i3 [j] such that kl satisfies AtomCi] [j]. These atoms give 
rise to intervals containing kl- The union of intervals representing 
Atom[i3 [1] OR ... OR AtomCi] [ji] therefore- contains kl for all 
i. Therefore, kl is in some interval in the intersection of these 
unions. That is, kl is in some interval, call it Jj , of tbe GEM- tree 
for coluann A for disjunct j , The reference set for this interval is 

{j}- 

Wiien this GEM- tree is merged into the final GEM- tree, Jj will give 
rise to one or more intervals in the final tree. Their union is 
precisely J j . In particular, j will be in the reference set for each 
of these intervals. Thus, kl is in .some GEM-interval for colxxmn A xn 
the final GEM-tree, and the reference set for that GEM-interval will 
contain j . 

Similar reasoning shows that the other key column values,-. k2, 

kn, are also elements of some GEM-interval on their respective 
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col-umns, and the reference set on each of these contains j. Denote 
these interrrals as II, 12, ... In. 

During Traverse_next , we will at some point traverse the interval II 
and therefore the value kl- When traversing the second column, 12 ' s 
reference set will have a non-empty intersection with Column [2] .C_Ref 
(since both will contain j). While traversing 12, then, we will 
traverse the value Jc2. And so on to the nth column, Xn the nth 
column, • we will generate a begin/end key pair of the form (kl,k2, 
k{n-X},bn) and (kl,k2, k(n-l),en), where bn and en are 

the endnoints of In. Since kn is contained in In, it follows that 
rows with key values (kl,--.,kn} will be retrieved in tnis subset. 
Therefore, Kp is contained in Kg. 

Having shown containments in both directions. Kg = Kp and the theorem 
is proven. 

QED. 



3.3.5 Further Refinement 

In this section we discuss some refinements that are in discussion 
at this writing but not completely thought out. These will be 
implemented if time permits, 

3.3.5.1 Disjuncts Lacking Predicates on Rightmost Keys. 

The algorithms given do suffer from (at least) one ineff icency. If 
some disjunct does not have a predicate on the rightmost column, we 
will needlessly start a subset for each value in the next column to 
the left. We could instead simply treat the next column to the left 
as the rightmost column in this case - creating a single subset for 
each interval in that column. 

This optimization could be carried to 'n levels: if there are no 
predicates on the rightmost n columns in some disjunct, we can do 
subset processing directly on intervals in the next column to the 
left. 

The basic idea is as follows: 

Given, a disjunct j , let Xj be the rightmost key column that has a 
predicate. In the GEM- tree for Xj on disjunct j, set a flag in all 
the intervals saying these are subset intervals - 

In the stage 2 algorithm, any interval generated from subset 
intervals is also a subset interval- So, in general, subset 
intervals can arise in several columns. (Indeed, we can think of all 
the intervals on the rightmost column as being subset intervals.) 

In the traversal algorithm, we treat any sxibset interval as a subset, 
instead of traversing the values in that interval. That is, we don't 
traverse to the next column if the subset . flag is set. Instead we 
supply hi and lo values for the missing key colximns. 
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A complication arises if in Stage 2 an interval is derived from the 
intersection of a subset interval and a non- subset xnterval. We 
can use it as as subset only for the disjunct that it is a subset 
interval for. So, we need more than just a subset flag; we need at 
least to keep a second reference set detailing which disjuncts the 
interval is a subset interval for. 

It is not yet clear that this is sufficient, however. Wore thought 
needs to go into understanding this refinement. 

This refinement will be implemented if time allows. 

3.3.5.2 Leverage from Sparse Algorithm. 

If we use the sparse algorithm on some column X, we will also get 
values back for the columns to the right of X when doing the DM^GET. 
We can use these values as starting points when we traverse to 
columns after X. This may somecimes save us from generating keys that 
don't exist. 

3.3.6 Non- Key Predicates 

The GEM- tree algorithms generate keys for retriving values -from a 
base table. It may be that there are non -key -predicates involved as.- 
well. We can support these easily by generating a base table EXD 
representing the entire predicate, and sending that to DP2. 

If the original predicate has the form 

(Key predicates) AND (Non-key predicates) 

we can, as an optimisation, generate an EXD for just ttie non- key 
predicates since they apply independently on all rows. In 
particular, if there are no non-key predicates at all {tbat is, if 
all the predicates are key predicates) , we can dispense with, the EXD 
altogether. 



3.4 Optimizer Processing 

The optimizer rciust take a parse tree and recognize when GEM index 
use extensions will result in a more cost effective way of accessing 
data. The GEayi extensions consists of allowing range predicates used 
in an index to be followed by other key colximn predicates, and allow 
the general use of OR index access to be extended to joins.. 

The two extensions highlight the two major limitations we have today 
in the use of indexes. We currently only consider using additional 
parts of the key if the previous part of the key used only equality 
predicates, and do not consider the case of missing prediates at all- 
In addition OR index processing is resticted to when only one table 
is specified in the FROM clause. 
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Ttie optimizer will create the equivalent of disjunctive nonnal form, 
in a way that uses little memory. It assigns predicates to clusters. 
All predicates connected by only ANDs form a cluster, A single 
column IN list will also form a cluster {an IK list is a set of 
predicates, all on the same colximn, and ORed together, e.g. A IN 
(1,2,3) OR A-7 is equivalent to A=l OR A-2 OR A=3 OR A-7) . 

Each cluster is represented by a cluster structure. Each predicate 
info table will point to the cluster entry it is a part of. Each 
cluster can be ORed to other clusters, and the result can be ANDed or 
ORed with other sets of clusters. We will materialize each disjunct 
of the expression (disjunctive normal form) and add the disjunct 
number to a list hanging off the cluster entry. We do not keep 
the disjunct after materializing it. Now we know which px-edicates 
participate in which disjuncts. 

A paper describing this process is in $sqldoc,gemdGC . disjunct- It is 
in PC format and can be printed by 

FUP COPY \tess.$sqldoc-gemdoc. disjunct, $s, ^^pscrpt? , unstnictured 
The ? in $s.#pscrpt? will be the printer number. 



3 . 4 . 1 Use of Predicate liists in the Optimizer 

A search tree exists for every index of every table in the query . 
These are combined in various ways to do joins. Eacii time a new 
search tree is created a list of ■ predicates appropriate to that 
search tree is created - 

Each entry in this list points to a predicate info table entry, which 
in turn points to the predicate. In addition this predicate info 
entry will now point to a cluster entry which has a list of disjuncts 
this predicate participates in. In order to find out if we^ have a 
usable predicate for a particulat disjunct we only have to find it in 
this ordered list of disjuncts. 



Search Tree 

ptr to local 
predicate list 



V 

Local pred list 

ptr to pred info 

ptr to next entry 



Pred Info 
ptr to pred 
ptr to cluster 



--> A = 6 
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cluster 
ptr to disjuncts 



Disjunct # 

ptr to next 
disjunct 



3.4.2 Handling of Rancfe Predicates in t he Optimizer 

The way range predicates are handled will be changed for this 
protect . Begin and End range predicates will be recognized f or. each 
search tree's predicate list. These will be pointed to each other 
through the predicate info table. Wherever selectivxty X3 ca.Xculated 
for these predicates in the optimizer will be changed to use the 
following fonmila when constants are used to limxt the range: 

sell + sel2 - notnull'^selectivity 

The begin/end selectivity will only be calculated for the > or 
operator. 

The search key array will be changed to point to the > or >= 
predicate- only. (Currently it will add both to the end of the key 
list.) Now the key list will contain only one reference per column. 



3.4.3 Building the Path Fred Lists 

The path pred lists in the optimizer are a list of potential 
predicates for an index. Currently we stop building after we process 
a column that has not equality predicate for it. We _W3.11 now 
continue to the last key column, even- if no predicate exists for a 
column.. ■ ' • 

Single coliiim OR list type predicates will be chained together, so 
they appear similar to a single predicate in the list. 

Range predicates are also chained (although through the pred info 
table) /so they also appear similar to a single predicate in the 
list. 



52 



EP0 747 839 A1 



General Expression Method 



3.4.4 Choosing Predicates for an Index 

Each search tree represents a path (index) . For each search tree we 
try to assign index predicates and cost that use of the index. 

There are two aspects to thLXS. One involves using multiple ranges 
and single column OR lists for a key. The other lnvoi^^es costing 
each disjunction separately. 

3.4.5 Multiple Ranges and Singrle Col uinn OR Lists in a Key 

The optimizer will use costing to determine how many ^eys of the 
index can be used using GEM. We will always get the pre GEM cost 
first and then add GEM extensions in several steps, to ensure that 
each key coluinn added for GEM will result in a cheaper index. 

For example consider the following key predicates (key is on columns 
a, b, c and d) : 

WHERE a BETWEEN (1 and 5) AND b = 5 AMD c IN (1,5,7) AND d<4; 

The pre GEM index selection will . only include the BETV?E^ 
for column a. After this has been cos ted by the optimizer, additional 
key predicates will be added to the key list. The b=5 and c 
IN (1,5,7) predicates will be added- This will be costed by the 
optimizer and the cost cort^ared to the pre GEM cost- If 2.t is still 
cheaper we will try to add the last predicate to the key list and 
cost it again. If still cheaper we will let this stand as the key- 

3.4.6 Handling Disjunctions in the Optimizer 

- The predicate tree for the current query has been processed to sec 
UP a list of disjuncts off each cluster. If there is more than one 
disjunct, then for each* index we must go through all the disjuncts 
(up to the pre GEM costing limit) . 

•Within each disjunct a key can have ^multiple ranges and in lists. 
These are processed within each disjunct. 

When we have disjuncts we still must find the pre GEM cost. The cost 
here Involves only predicates that are common to all disnuncts. 

After we have the pre GEM cost we now will process ^ each disjunct 
separately. The total cost for all disjuncts will be .accumulated. 
Whenever the total coat for all disjuncts equals or exceeds the pre 
GEM cost we quit and use the pre GEM index. . 

The optimizer will need to take care of the following for each 
disjunct: 
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* Calculating Selectivity ( opta^ table'sel ectivity) : 

Only predicates for the current disjunct can be considered. 

Disk process predicates can only be predicates that are in all the 
disjuncts . 

* Determining Locking and SBB: 

Sequential Block Buffering must be separately determined for 
each disjunct. For each disjunct check if a different type of 
sequential block buffering was chosen. If so keep only one type 
with following priority: Real SBB, Virtual SBB, No SBB. (Real 
will be chosen over Virtual SBB , ) 

The type of locking can only be decided after we have determined 
the total niimber of records that will be returned from ttie disk 
process. After all the disjuncts have been processed a new 
determination of locking will be made. 



3.4,7 Implementation - Choosing Index Predicates 

I will add the call to join costing to opta^tahle^cost and change 
this procedure to a loop that does the following: 

* Cost the index as it currently would be today. 

* Check if more keys could be added to this using gem- 

Loop over the disjuncts, by passing a disjunct number to the 
procedure that ciiooses predicates for the key. 

For each disjunct start with the last set of predicates for the 
key and try to add more key columns. Stop each time a new 
range is added. 

If any key columns have been added cost it and compare it to 
the last set. If the previous set was cheaper, revert to it 
and go on to the next disjunct- 

. If this use of the index is cheaper than the last continue to 
try to add key columns. 

Reuse the search key array for each disjunct. 

For each disjunct check if a different type of sequential block 
buffering was chosen If so kefep only one type with following 
priority: Real SBB, Virtual SBB, No SBB. (Real will be chosen 
over Virtual SBB.) 

. For each disjunct add the cost of the previous disjunct to a 
running total- Stop when the cost exceeds pre GEM costing. 
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Some disjuncts may be logically impossible and should not be counted 
as accesses. (The executor will eliminate logically impossible 
disjuncts . ) 

The optimizer will eliminate disjuncts that it believes will probably 
not result in a retrieval. {These disjuncts are only eliminated for 
costing purposes.) 

The optimizer will look for multiple eofual predicates on the same 
column. If these are compared^ to different constants or host 
variables we will eliminate the disjunct . 

When a GEM key has a sparse range in it a bit will be set in the 
search tree to tell the optimizer to return the key columns in a 
read . 

GEM will be always be allowed for nested joins, but not for KS merge 
joins- 



3.4-8 Costing 

There are two costing routines that are affected (opta'table'cost and 
opta^join'^cost'^nested) . Both of these will be modified to handle 
ranges that are followed by other predicates and. single column OR 
lists. 

A range that has following predicates will be costed by calculating 
the number of unique values within the range (selecnivity times 
unique entry count). This is the number of positions we must do. 
A single column OR list will be costed by cotinting tb.e number of 
equal predicates in the OR list- For any single column OR list 
predicate that is not an equal operator we must estimate the number 
of positions we must do. The total of equal predicates and estimated 
postitions is the number of positions we must do. We multiply the 
number of positions for each column together. We then multiply this 
times how much it costs if we have equals for the range and in list 
columns , 

There are two types of positionings. One occurs when the values for 
the colimn are densely packed and numeric (not floating point) . In 
this case the executor will just add one to the value. The other 
case occurs when the data is not densely packed. In that case we 
do a keyposition probe to find the next value for the column and 
then another keyposition to find the next row we want- The sparse 
algorithm is twice as costly as the dense algorithm. 

This extra cost for the sparse algorithm needs to be added to the 
cost. However since the same predicate could appear in. the leading 
column multiple times we do not need to cost this multiple times- We 
will check the clusters disjunct list to see if an earlier disjunct 
had already costed this leading range or if any disjunct haa no 
predicate for this column (meaning we have to go thorough all the 
values in the column anyway) . 
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There is a fixed cose to the executor for" building the GEM trees. 
This cost will be a function of the number of disjuncts and key 
colximns used. For joins this cost must be multiplied by che number 
of outer records. 



3.4.9 Use of Search Key Array. 

■The search key array is used in several places in the compiler Most 
of these places will need minimal changes to handle GEM additions. 
In general a join that uses GEM features does not preserve order, 
while an outer table that uses GEM does preserve order. 



3.4.10 Interface with the Generator 

The creation of the Begin/End Key structures is different^ with^ GEM. 
First we did not save all the search key arrays, only the last one. 
Also the search key array has only selected predicates not ail the 
predicates for a column. The disjuncts must be recreated and all the 
predicates for a column listed. 

The begin/end key structure for GEM will point to a three tiered 
structure. The first tier is general information about the column, 
it contains pointers to. the next- tier and__ back to the f xr^c tier jEor, 
each disjunct- 

The second tier contain compact list of pointers to predicate chains. 

The third tier contains list of predicates for a single column that 
ia in a single cluster. The second tier links multiple clusters 
together. In this way if a predicate is used m multiple diSDuncts 
less space is required. 

Begin End Key 

col 1. 1 col 2 \ col 3 I etc. ~ 



Tier 1 

y - 

T . Data type info 
, pointer to tier 2 

- pointer to tier 1 for next disjunct 

- buffer addresses 
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Tier 2 



. pointer to tier 3 
. pointer to next Tier 2 entry 
for disjunct and coliimn 



Tier 3 

V 

. operator 

. pointer to source expression 

- pointer to value part of expression 

. pointer to next Tier 3 entry 



3.4.11 Cross Products with Lea ding Rancres 

cross products in parallel are materialized^by having the data for. 
an outer partition combined with ail the data from each talple in the 
cross product. If an inner tatole had a leading range on xt we would 
have to look through the entire range for each cross product. 

An alternative way to look. at this is to do a coatplete cross product 
using the entire outer table each time. Normally the table that has 
esps started for each partition is the outer table. However, m this 
case we want the outer table to have the cross product a^d the table 
that we are joining the cross product to will be the one that drives 
the esp assignment- In this way we only have to apply the complete 
cross produce to a single partition of the large fact table. 

The compiler will do a test casting of both ways of doing this and 
choose the cheaper. 



3.4.12 Explain 

The predicates for each disjunct will be shown. The begin/end foi 
will not be used, instead the disjunct and the expressions valid 
that disjunct will be displayed. 

3.4.13 Interaction with Current OR Optimization 

The two types of OR opcimization will not be mixed initially, 
will be costed independently and the cheapest will be chosen. 
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Evencually we wane to integrate the two. The way we expect this to 
be done is by costing ail the possible disjuncts for each index, and 
choosing the cheapest index for each disjunct- Then each index will 
have a set of disjuncts associated with it. Each xndex will then 
be a GEM index. The creation of the ISFALSE expressions is a little 
more complex than it was in the way this was previously done. 

The current OR optimization would choose a different scan for each 
disjunct. It would only choose OR optimization if the predicates 
were already in disjunctive normal form. The revised OR optimization 
would not require the predicates to be 'stated rn disjunctive normal 
form, since the new design creates disjunctive normal form. In 
addition there will only be one scan per index. 



3 . 5 Design Assumptions 

Optiraizer will need to ensure that the key columns are part of the 
select list. 

Need to talk about: 

locking and dm^key position 

costing . ^ ^ 

predicate formats (rarity of non- overlapping ranges for tJie same 
column) 

— versioning 



3.6 Design Restrictions 

* The GEM optimization will not apply to queries where there is a 
disjunctive OR on a non-key column. For example, for a table 
TCA,B,C) where the key is CA,B) , the following query cannot use 
.this optimization: 

<A BETWEEN 1 AND 100 AND B = 10) OH 
B IN (1,2,3) OR 
C = 100 

Because the predicate on C will result in a full table scan for 
. values of a non-key column. 

* All data scans will be through the same index, unlike OR 
optimization where different scans through the table caja be ' via 
distinct indexes. This feature however allows for- support of DP2 
aggregates and GROUP BY in queries where GEM optimization is used 
in place of OR. optimization* 
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4.0 SPECIAI. CONSIDERATIOiTS 
Need to talk about: 

- - performance and recjuireinents which need to be met, 

versioning and backward ' cornpatibility with downrev- systems . 
-- any (if any) issues with parallelism?? concurrency? locking? 



5.0 COHSIDERATIOK5 FOR FTJTPRE ENH&yCEMSHTS 

Once GEM has been £ully implemented and tested* _ moving the 
Implementation to the disk process layer should be considered (refer 
to Section 9.0). This enhancement will allow for an even^ better 
performance by. pushing key building routines and algorithm for 
retrieving values from ranges down to DP2 . 

In the future a faore effective use of VSBB may be considered- as an 
added advantage of the DP2 implementation. 



€ . 0 TESTING 

This section should describe any special testing issues, as well 
as a general outline of testing required to ensure a quality 
implementation. This information sho\xld not duplicate tixe Unit Test 
Plan, but instead give a higher level abstraction and point out amy 
special system level testing concerns. 



7,0 OPKN XSSTJES 

* How will key predicates and joins containing column values be 
handled? Can they be handled? 

Example: Tl-A = T2.A OR 
Tl.A = T2.B 

* VThat are the special considerations for ordered columns and 
columns which have collations defined on them? 

* How will NULL predicates be handled? 

* If key columns are in different order (some are ascending and 
some descending) , expression evaluator will need . to fiddle with 
the EXT'^FIRST'KEY^TYPE flag accordingly in -cases of overflow and 
underflow. Currently this flag gets set statically, but it will 
need to change dynamically for GEM. 

* Depending on -how GEM- tree information is stored and retrieved, 
a SORT may be necessary (i.e. if key column is descending 
and GEM- tree keys are being retrieved in ascending order, the 
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rertriev-ed rows will need to be sorted) . Must look into 
optimizing this if possible . 



B.O CLOSED ISSUES 

* Multi -Value Predicates (MVPa) will always be transformed to their 
disjunctive normal form, whether they are in the terminating 
column (s) or not. This is so that we don't have to teach the 
duplicate elimination routine to learn about MVPs and collapsing 
such ranges . 

* Can data retrieved from the sparse algorithm not be analyzed and 
instead simply move key values needed to key buffers?? Downside 
is that some rows maybe retrieved more than once. 

NO. Retrieved data needs to be analyzed so it can be decided 
which predicate to .associate with the current key being built. 
This will not be necessary if the entire predicate will be applied 
to the retrieved rows at the executor layer, but then the number 
of rows handled could be significantly more. 

* Will LIKE always be transformed to a range and what in cases it 
cannot be (e.g. wild cards)? 

In passible cases where 'the wild card is not in the leading 
portion of a constajit like pattern, and no collations are 
specified for that predicate, the predicate will be transformed 
into a range. In all other cases (patterns stored, in host 
variables or parama or when collations are used) , th.e like 
predicate will become a non-key predicate. 

* NOT predicates will be transformed into non-key predicates- 

* How will non-key predicates be handled? 

It was originally planned to associate non-key predicates to key 
predicates via a tag or identifier and to dynamically link them 
together as needed when ranges were collapsed. A new design 
is under consideration now. The non-key predicates and key 
predicates will be generated and will be sent to DP2 in their 
entirety and will be evaluated to eliminate unnecessary rows. 
There is no longer a need to associate non-key predicates with a 
given key predicate set. 

* There are no issues with locking and supporting DP2 aggregates. 

* How can predicates be combined together dynamically? If use 
of Fast Expressions is necessary, optimizer will need logic to 
estimate size of expressions such that we have a work around for 
those who get too big. 

In the first phase of this project, we will use fast expressions 
and will send the entire expression dovm . to • the DP2 layer. In 
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future r we may want to consider alternatives to optimize this 
process . 



9 . 0 DESIGN AI.TERKATIVES 

Xnitialiy, there was a plan to implement this new access method in 
the DP2 layer (for a detail of the design please refer to [5] ) . 
This plan was reconsidered and the design has since changed for the 
following reasons: 

* With the DP2 approach, we could only support BROWSE access within 
the given timeframe and . thus could not extend this new access 
method to include UPDATE and DELETE .DML statements. ^Wit.h the new 
design, DM^KEYPOSITION can be used to mimic the desired behaviour 
of materializing existing rows within a rauige and will allow a 
work around for the lockiing issues. 

* Because of a time constraint to make this enhancement available by 
April 1995, it was decided to reduce the involvment of different 
components and keep the changes centralised in the SQL compiler 
and executor area. The other design will need support from BP2 
and FS2 as well and will result in DP2 introducing a new dialect, 

* Although the performance gains of - the -new design will not be 
as significant as the later, it will atill be notlcable gains 
compared to current behaviour. 



61 



EP 0 747 839 A1 

General Expression Method 

10.0 AgPEkoiX A 

Suppose we have a table TCA,B,C,D) with key columns {A,B,C), with the 
following contents: 
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Consider the predicate example given in section 3.3 (here shown in 
the form suggested by the GEM- tree) : 

A 1 and < 65 AND B = 10 AHD C < 30 
A >=- 65 and < XOO AND B « 10) 

A >- 65 and < 100 AND B > 10 and <- 20 AND C >« 30 
A = 100 AND B > - 10 

A > 100 and < = 120 AND B >= 10 and <= 20 AND C 30 
A - 200 AND B >- 10 

We will show in detail the use of the sparse and dense algorithms. 
Please note that this is example merely illustrates the algorithm 
and doesn't- necessarily exhibit the performance gains from thxs new 
access -method. 



10-1 Sparse Algox-ifchm 

Here we show the use of the sparse algorithm. 



OR 
OR 
OR 
OR 
OR 
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A subset will be started for table T, with the first key from the 
GEM tree describing the above expression. The random flag- of the 
DM block will be set so to allow random access to the file. 

At cursor open time, after building the GEM-tree, we traverse 
it and get initial key values. At first fetch, we retrieve the 
first row: 
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B 
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10 
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10 
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10 


30 











begin- key 
* end -key 

The columns on the left signify the key set up for the DM^START 
and DM'GET call and the columns on the right are the result of 
the DM'GET call. A * next to begin-key means the DM'*AFTER flag 
is set, and one next to the end-key signifies that DM'^BEFORE is 
set. 

Because there is a range specified on the last key column, we 
will continue to retrieve rows on later fetches until all values 
in the range are exhausted and we get .an EOF: 

.--> 1 10 7 1 

._- — >.. ...EOF. 



Note that its not necessary to find the next value in the range 
for coluimi C. 

With EOF, we know the range on C has been exhausted. We traverse 
the GEM- tree some more to find the next key range. We see that 
there are no more intervals for C for the current values of A 
and B. We backtrack to and see that there are no more for B 
either- So, we backtrack to A, 

Here we use the sparse algorithm to find the next value of A in 
the interval [1,65). We refer to this as probing for column A . 

On a separate DM block (which lacks any base table predicates) , 
we start a subset as shown below and retrieve a row: 
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65 
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* B-key 

* E-key 

Note that we set the DM^AFTER flag to avoid retrieving the value 
A =1. Note that we set 'DM^BEFORE, and use lo, lo for .3 and .C in 
the end key, because the ending predicate is A < 65. (Were it A 
<= 65, ws would not set DM^BEFORE, and we would use hi, hi for B 
and C in the end key . ) 

We use the value 3 as the current value for A, and traverse on to 
column B. The only matching interval is [10,10] . We traverse on 
to colijmn C. We find the matching interval [lo,30). 
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B-key 
* E-key 

Note that we happened to retrieve the saiae row a second time. 
In general, though, . the row retrieved when probing for A will 
not satisfy the original predicate. (For example, if the row 
{3,9,10,10) were present in the table, we would have retrieved it 
when probing instead-) The point of the probe phase is to find 
the next value of A efficiently, not to do a table scan until we 
find the next qualifying row. In general, we would expect many 
data blocks between the row retrieved while probing and the row 
retrieved when f etching . 

On the next fetch, we get EOF, and we traverse the GEM- tree 
again, in the same manner. 



10.2 Dense Algorithm 

Here we go over the same, example again ^ this time using the dense 
algorithm. 

1. At cursor open and first fetch time, we start our first subset 
and retrieve the first row as before: 



B-key 1 10 la > 1 10 1 1 

* E-key 1 10 30 

2. On the next fetch, we get EOF. So, we traverse the GEM- tree 
some more to get the next key range. As before,- we backtrack to 
column A. We find the next value of column A by adding one to 
the current value (obtaining th.e value 2) . We then traverse to 
columns B and C as before, and obtain a new sxibset- Returning to 
fetch processing, we do a DM'*'KEYPOSITION an'd a DM^GET, obtaining 
EOF; 

ABC 

B-key 2 10 lo > EOF 

* E-key 2 10 30 ^ . 

3. Because there were no matches, we traverse the GEM- tree some 
more. We try again using 3 for A: 
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B-key 
* E-key 

On the next fetch, we once again obtain EOF and go through 
GEM- tree traversal again. Future fetches follow similar 
processing. 
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Claims 

1. A method of searching a database using individual multi-column search keys constructed in accordance with a 
search query specifying predicate values corresponding to search key column values, said method comprising the 
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steps of: 

(a) evaluating the predicate values specified by tine search query; 

(b) for each predicate value specified as a range of values, assigning equivalent range values to the corre- 
sponding key column; 

(c) for each predicate value specified as an IN list of values, assigning equivalent values to the corresponding 
key column; 

(d) for each unspecified predicate value, assigning a range of values to the corresponding key column; and 

(e) using the equivalent values from steps (b) - (d) to construct individual search keys. 

2. The method of claim 1 wherein a search query includes a plurality of disjuncts; and wherein said step (a) of evalu- 
ating includes the step of assigning a unique disjunct number to each of said plurality of disjuncts. 

3. The method of claim 2 wherein said step of assigning disjunct numbers includes the step of assigning an individual 
disjunct number to an IN list predicate. 

4. The method of one of the preceding claims wherein said equivalent range values assigned in step (b) are specified 
as the !o and hi values of the con-esponding key column. 

5. The method of one of the preceding claims wherein said step (a) of evaluating includes the steps of associating 
predicates to clusters and associating clusters to disjunct numbers. 

6. The method of one of the preceding claims wherein said step (e) of using includes the step of eliminating conflicting 
predicate values from the same column to reduce the number of potential search key values for that column. 

7. The method of one of the preceding claims wherei n said step (e) of using includes the step of eliminating redundant 
values specified by the disjuncts to minimize the number of search keys required to fulfil! the search query. 

8. A computer system for storing and providing user access to data in stored database objects, said system compris- 
ing: 

a memory for storing said database objects; 

a communications interface through which user database queries are relayed to said computer system from a 
user workstation, and through which query results from said computer system are made available to the user 
workstation; and 

a processor for controlling interactions between said memory and said communications interface in response 
to a search query received from a user workstation and spedfying predicate values corresponding to search 
key column values: and 

a computer program for causing said processor to construct individual multi-column search keys in accordance 
with a search query specifying predicate values corresponding to search key column values, said computer 
program including a first procedure for evaluating the predicate values specified by the search query; 
a second procedure for assigning equivalent range values for each predicate value specified in the search 
query as a range of values, to the corresponding key column; 

a third procedure for assigning equivalent values for each predicate value specified in the search query as an 
IN list of values, to the corresponding key column; 

a fourth procedure for assigning a range of values to the con-esponding key column for each unspecified pred- 
icate value; and 

a fifth procedure for constructing individual search keys from the equivalent values assigned by the first through 
fourth procedures. 

9. The system of claim 8 wherein a search query includes a plurality of disjuncts; and wherein said first procedure 
causes said processor to assign a unique disjunct number to each of the plurality of disjuncts in the search query. 

10. The system of claim 9 wherein the first procedure causes said processor to assign an individual disjunct number to 
an IN list predicate. 

1 1 . The system of one of claims S to 10 wherein the equivalent range values assigned by the processor in response to 
the second procedure are specified as the lo and hi values of the coiresponding key column. 
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12. The system of one of claims 8 to 11 wherein the first procedure causes said processor to associate predicates to 
clusters and associate clusters to disjunct numbers. 

1 3. The system of one of claims 8 to 1 2 wherein the fifth procedure causes said processor to eliminate conflicting pred- 
icate values from the same column in order to reduce the number of potential search key values for that column. 

14. The system of one of claims 8 to 13 wherein the fifth procedure causes said processor to eliminate redundant val- 
ues specified by the disjuncts in order to minimize the number of search keys required to fulfill the search query. 

15. A computer program product comprising a computer usable medium having computer readable code embodied 
therein for constructing individual multi-column search keys for searching a database, the search keys being con- 
structed In accordance with a search query specifying predicate values conesponding to search key column val- 
ues, the computer program product comprising: 

first computer readable program code devices configured to cause a computer to evaluate the predicate values 
specified by the search query; 

second computer readable program code dej/ices configured to cause a computer to assign, for each predicate 
value specified as a range of values, equivalent range values to the corresponding key column; 
third computei- readable program code devices configured to cause a computer to assign, for each predicate 
value specified as an IN list of values, equivalent values to the corresponding key column; 
fourth computer readable program code devices configured to cause a computer to assign, for each unspeci- 
fied predicate value, a range of values to the corresponding key column; and 

fifth computer readable program code devices configured to cause a computer to use said equivalent \ralues to 
construct individual search keys. 

16. The computer program product of claim 15 wherein a search query includes a plurality of disjuncts; and wherein 
said first computer readable program code devices are configured to cause a computer to assign a unique disjunct 
number to each of said plurality of disjuncts. 

1 7. The computer program product of claim 16 wherein the computer is caused to assign an individual disjunct number 
to an IN list predicate. 

1 8. The computer program product of one of claims 1 5 to 1 7 wherein the equivalent rang e values assigned by the com- 
puter in response to said second computer readable program code devices are specified as the lo and hi values of 
the corresponding key column. 

19. The computer program product of one of claims 15 to 18 wherein said first computer readable program code 
devices are configured to cause the computer to associate predicates to clusters and clusters to disjunct numbers. 

20. The computer program product of one of claims 1 5 to 19 wherein the fifth computer readable program code devices 
are additionally configured to cause the computer to eliminate conflicting predicate values from the same column 
to reduce the number of potential search key values for that column. 

21. The computer program product of one of claims 15 to 20 wherein said fifth computer readable program code 
devices are additionally configured to cause the computer to eliminate redundant values specified by the disjuncts 
to minimize the number of search keys required to fulfill the search query. 
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